Oberlin College Geomorphology Research Group: May 2016

Sunday, May 15, 2016

Please someone tell me how to write generic code for specifc tasks

Heyo, Joe here, coming off of a great semester of working with Marcus to coerce our data to play nice with Python. Our goal was to unite the geographic data we had for the area upstream of each point with the isotope data from Harbin, our hard working germanium detector, into a format that could be sensibly manipulated with Python for graphing and statistical purposes. The problem was twofold, figuring out a way to sensibly store and access data, and how to take that data and use it to make graphs that were understandable and looked nice. I dove into writing a whole tangle of functions to pull out the data of interest and Marcus became good friends with the matplotlib documentation, his only ally in the noble fight against the matplotlib library.

It must’ve been just about two years ago now that I first started to truly get my hands dirty with both Python and ArcPy, ArcGIS’s Python library. I started with a simple goal, create unique watershed files for each point in a shapefile full of sample collection locations. Through a combination of the ArcPy documentation, stackoverflow answers, and a dear friend of mine with far more Python experience than myself, I was able to create such a script. It was tailored to my specific project, but I tried my best to make it something that could be reused for other projects. Looking back now, I would do it all totally differently, but ya live and learn!

When I began work on my next script, which extracted spatial information for each watershed, I became consumed with finding my way around ArcGIS’s ‘table joins’ which is perhaps the most obtuse way to unite two sets of data. I won't go into detail, but I accomplished my goal, learning a lot about how ArcGIS stores data in the process, and thus began the quest that still consumes me this day, which is to avoid using ArcGIS at all costs, offloading as much work as possible to Python.

In the fall, I declared, to no one in particular, my intent to secede from ArcGIS, and began work on a Python project to manage my data, which would only dirty its feet by dipping into ArcGIS as needed for certain spatial analyses, then whisking the results out of the clutches of whatever heinous file that Arc would create, and into the sanctuary of my Python datatype. Progress was slow, mainly because I kept on trying to start over! My code worked fine, but I was never satisfied with how it was structured, I wanted this to be something that people doing similar, but distinct, work could use. I struggled with how to avoid design decisions specific to my project, which was hard to do when I was also trying to use it at the same time to do my actual project. Eventually looming deadlines (apparently you need “results” when you “present” at a “conference”) forced me to move forward, so I ended the semester with a datatype to store the data about my samples, some functions to grab that data, and some functions to graph it.

Now, from what you read above when you saw “functions to graph” you may have thought to yourself, “oh, this must be where matplotlib comes into play” and you would be right, if I had an ounce of sense in me. For a reason I am unable to explain, I’m not sure if it was ignorance about the existence of graphing specific libraries like matplotlib, hubris, or just naive fondness for LaTeX, I decided to write functions to generate the markup to generate plots using the PGFPlots package for Latex. This meant that instead of calling functions like plt.plot(), I was writing long format strings to generate a file in the LaTeX markup language. The results were rather pleasing, but when I came back to the project in February with Marcus I thought a more straightforward approach would be appropriate. When we found out about matplotlib, I thought, “Now here is the answer to all of our problems! All we have to do is hook up the Python code that stores the data to matplotlib and out will come beautiful graphs”. Sure, I thought we might have to do some tweaking to get graphs up to our very refined standards, but how hard could it be. For that answer, see Marcus’s post.

The template...

...and the result!

So, as Marcus went off to figure out just how to make matplotlib give us graphs that could be read with ease, I went off to figure out how I could pull the data we wanted out of the jumble of samples we were working with. Our dataset was a collection of 83 soil samples from three different field seasons. For each sampling location we determined the area upstream of it, and calculated various geographic parameters.

Now that I had gotten all this data, it was time to get organized logically. This took some time, but boy was it worth it. Once I knew that all the data would be have the same way, I wrote a series of functions (way too many functions, probably, but once you learn Lisp, there’s no going back) to return lists of the data we actually wanted to plot. If each sample has activity and error values for 3 different isotopes, a thousand different geographic parameters, a location, links to files, lists of other samples that this sample is related, it’s not quite plug-and-play. But it got done, and meant that if some of the values for our samples changed (as they often do) or if new samples got added or old samples got removed, as long as they conformed to the standards, we didn’t have to do a darned thing! Just take the list of sample objects, plug it into the function that pulls out the data you want to graph, and then shoot the result of that into Marcus’s graphing code. Badadbing badaboom!

Things should be smoother from here on out...until I finally figure out that perfect structure and write the definitive program for managing soil samples, computing watersheds, doing some spatial analyses, and plotting and tabulating the data. Someday it will happen, and we will be better for it.

It's been a great 2+ years working for OGRe, but I wouldn't be surprised if I come back in one form or another, even if it's just to preach about why we should be scripting more and clicking less.

Thursday, May 12, 2016

Lab Work (Spring 2016)

Hi all,

Marcus here, and ready to share with you what i've accomplished in the while working in the lab this Spring. This project was quite an undertaking from the very beginning; I had just conquered the mighty task of “Hello World” in java when I found out that I’d be paired with Joe to prevent more Excel made graphs from entering publication. We were given the option to either work in R, a completely unfamiliar language to both Joe and I, or to see if there were any ways to get Python to co-operate. Enter matplotlib (MPL).

This library was so combative that we had to make use of a separate application called Jupyter, which already had MPL integrated in it, to start working. Determining a starting location was a task in itself. Our first graph made extensive use of an oh-too-kind stackoverflow users code which gave us a 7x7 grid of information. That was a lot to take in, so we started looking into how get more specific with what we were presenting. The next graphs we created were specifically targeting lead, in-channel vs overbank and resample vs original, including variants that had data that accounted for negative values and those that didn’t. The nature of working with lead values is that they all had crazy error-bars, which got to be distracting visually. Suffice to say, there were a lot of moving pieces that didn’t want to work together at first.

This is when we started changing up our approach, Joe had delegated me to be more in charge of creating the graphs themselves while she continued working on her already built code that was able to grab and pair related information sets. My job required me to understand what data I was being given and how to use that in the graphs I would be creating, so naturally I needed to understand, at least some parts of, the code Joe was already working on. Now my to-do list included: learning how Jupyter worked, learning how to use MatPlotLib, understanding how Joes code was structured. Maybe it would’ve been easier to work in R. Jokes aside, it was a somewhat daunting task, so I figured it’d be best to start by seeing how Joes code worked, as she was much more accessible than the authors of the other two applications.

It was a really interesting experience getting to see how Joe went about setting up her code to retrieve data and return meaningful results. The structure she had set-up worked in a way that was fairly straightforward, so figuring out how to add to and build off of what was there already wasn’t too difficult. It was a nice introductory period before we got our hands dirty with MatPlotLib. As I mentioned earlier, we really started off making plots that looked nothing like what our final graphs became. It was definitely a learning experiencing as we figured out what worked best and how to get information dense graphs that were still intelligible. Stackover-flow and the many documentation web pages quickly became purple links in my google searches as to why this wasn’t working or how to change this seemingly obvious part of the graph. Changing from regular plots to scatterplots, playing with subplots and legends and even changing the font size, color and shapes all seemed to have specific intricacies that want to provide un-intuitive results for seemingly small modifications. Ultimately though, Joe and I were able to work through these bugs as they popped up and create some good figures in the process.

This semester was a great introduction to working in a lab and I'm exciting to see what the next several years will bring!

Tuesday, May 10, 2016

STEM Night

On Friday 6 May, the Geomorphology Group joined other geology students to participate in STEM night, an outreach event for 3rd-5th graders. We had a great time. Below are pictures of OC students interacting with elementary school students. They are playing with the stream table (with Marcus and Adrian), exploring mineral properties (with Andrea), and looking at fossils (with Alex). Sydney and Andrew aren't pictured.