Data – Intraday Trading Data and R

Posted: May 15th, 2012 | Author: genevieve | Filed under: Data, Thesis | No Comments »

So a few weeks ago I had the good fortune to get my hands on intraday trading data, due to the kindness of the folks at Nanex Research. I asked for a plain old text file rather than learn their custom software and API, and what I received was a 23 GB text file. Now, it’s a bit difficult just to open a 23 GB text file and see what’s inside since most software programs aren’t built to handle that kind of memory management.

But I was excited to see what was inside nonetheless. This was also the dataset that I’d been waiting all semester to receive, and the one that I was looking forward to exploring more in depth in Mark Hansen’s Data class. Unfortunately, it came to me when there were only about 2 weeks left in the semester, but Mark was kind enough to sit down with me and go over some basic techniques of how to open, parse and model the data inside this massive file.

So, first of all there are some helpful commands in Terminal which allowed me to see what was inside the file, and then make smaller files that R could actually open. For instance, just to see what the file contained, I typed:

head 'filename'

Then, in order to see more of the file, I typed:

head -100000 'filename' >> tenthousandlines.txt

This gave me a new text file, named “tenthousandlines.txt” which contained the first ten thousand lines of the original file.

Another useful command is grep. Grep lets you search for a term, and returns all lines that contain that term. After looking through the head file with Mark, we noticed that one of the first actual trades was of the stock symbol eNOK, which stands for Nokia. So we made another file that searched for “eNOK” and then saved all the lines containing that term into a new file.

Also, by typing in ‘tail’ we were able to check the timestamp of the large file, and it turns out that a 23 GB file of electronic exchange activity (on what day I’m not sure), translates to about 4 hours worth of data.

We also made a file that counted how many lines there were with the same timestamp (in seconds). The resolution of the exchange data is every 25 milliseconds, but we wanted to get a feel for how much activity there was at different times of the day.

Next we brought the files into R.

We took a few approaches to understanding what was in the data. Keeping with the quote activity over the course of 4 hours, we first graphed that activity by typing:

cntPerSec = read.table("Desktop/count.txt")
plot(cntPerSec$V1,type="l")

This yielded a plot that looks like this:

We wanted to get a better look at what was happening in the later part of the data, so we typed:

plot(cntPerSec$V1[12000:15000],type="l")

Which gave us a plot that looked like this:

Then we did the next logical thing, which was to plot it as a histogram:

hist(cntPerSec$V1)

This looked like the familiar “hockey stick” so we did the obvious next step, graphing the log histogram:

hist(log(cntPerSec$V1))

This gave us a very normal looking distribution:

So normal, that Mark showed me how to plot the data as a QQ-Norm, which from a little googling is used to compare the distribution of the data to another distribution, in our case to see whether the data follows a normal distribution (right Mark?)

So we typed:

qqnorm(log(cntPerSec$V1))
qqline(log(cntPerSec$V1))

Which gave us the QQ-norm plot and the “best-fit” (linear regression) line.

So the timing of things was one way to approach the dataset, but we also looked at a more traditional way of approaching financial data, stock by stock. Using the eNOK file we made from the grep commands in terminal, we took that into R, and looked at what was there.

The main thing we did was plot the price of eNOK stock over the course of the 4 hours. We typed:

> enok <- read.table("Desktop/enok.txt",sep="|",as.is=T)
>
> head(enok)
  V1   V2      V3           V4       V5   V6   V7  V8   V9 V10
1 MQ NYSE eNOK    04:00:00.175 04:00:00 PACF 4.03 190 4.05  10
2 MQ NYSE eNOK    04:00:01.125 04:00:01 PACF 4.03 290 4.05  10
3 MQ NYSE eNOK    04:00:03.875 04:00:03 PACF 4.06 500 4.09 500
4 MQ NYSE eNOK    04:00:03.875 04:00:03 PACF 4.07 500 4.09 500
5 MQ NYSE eNOK    04:00:13.100 04:00:13 PACF 4.08 150 4.09 500
6 MQ NYSE eNOK    04:00:13.125 04:00:13 PACF 4.07 500 4.09 500
> plot(enok$V7,type="l")

And plotted this chart:

While it’s not necessarily the most interesting thing to learn about the data, to my delight, the chart looks a lot like the Nanex charts I’ve been pouring over the whole semester. So I realized that using R is the way it’s done.

A huge thanks to Mark Hansen for encouraging me to keep asking for data even after getting refused numerous times. I’m looking forward to using the techniques he showed me to keep pouring through this dataset. My first plans will most likely be a sonofication of actual trades through the constant noise of orders posted and cancelled. But I’m very excited to take a look at the activity happening at a sub-second timescale. It’s pretty mind-boggling, but I guess that’s how 4 hours fills up a 23GB text file.


Research Studio Update

Posted: April 5th, 2012 | Author: genevieve | Filed under: Research Studio Algorithms, Uncategorized | No Comments »

This week has been mostly spent finishing research for my paper, and starting to write the rough draft. I had a skype call with Piero La Mura, who is one of the leading experts on quantum computing and the financial system. His research and our talk was very interesting, but I’m not entirely sure whether I can incorporate it into my work at this moment, or whether it will lead to projects in the future.


Abstract – Research Studio: Algorithms

Posted: March 24th, 2012 | Author: genevieve | Filed under: Research Studio Algorithms | No Comments »

High Frequency Geography: Mapping the Materiality of the Global Financial System

This paper focuses on recent directions in my artistic practice exploring the relationship between finance and geography. In the past 15-20 years, the global financial system has come to rely more and more on computer mediated trading practices, known as algorithmic trading. Within this larger field, one area that has generated a particular amount of interest is high frequency trading, where profits hinge on the speed with which algorithms can react to fluctuations in market prices. In a global economic system where activities in Tokyo might affect algorithmic decisions in New Jersey, High Frequency Trading has come up against the fundamental limit of physical reality – the speed of light. This paper describes how trading activities have shifted away from human actors on trading floors towards algorithmic actors inside data centers, with a focus on the underlying infrastructure that these electronic trading exchanges rely on. I frame my research within a discussion of previous works by artists addressing the complex global financial system.


Mapping

Posted: March 21st, 2012 | Author: genevieve | Filed under: Research Studio Algorithms, Thesis, Uncategorized | No Comments »

Two serendipitous things happened today.

Toby sent me an article on a plan in the works where three different companies, one Russian, one Canadian and one American, are all investing heavily into laying down high speed fiber optic cables that would traverse the arctic circle and provide much faster connections between the US, Western Europe and Japan. Despite the huge investment and undertaking (each cable is estimated to cost between $600 million and $1.5 billion each, and will reduce latency between London and Tokyo by 30%), this is only possible due to global warming and polar ice caps receding significantly in the last few years.

I believe that this cable, this story, really brings together the connections between the financial system and the environment that I’ve been trying to deal with in more metaphorical ways (with the Unity landscape). Currently, I’m trying to figure out how to relate the speed gained by investing billions into these cables, an idea only made possible by human impact on our planet, and the effects that the financial system (and the infrastructural feats we are willing to do in its name) will continue to have on the environment. Here is the map of the planned cable.

I also met with Heather earlier in the day about my paper topic for her research studio class and my thesis project. Since I haven’t actually completed the projects I want to make, I can’t necessarily write the artist’s paper she had in mind for me. I still want to frame what I’ve been researching in that vein, but I may have to focus on other artists who have tried to do similar things.

So that got me looking at a book I’ve had for a while called Else/Where: Mapping New Cartographies of Networks and Territories. Flipping through, I came across a diagram that immediately resonated with me in light of the previous map.

The diagram, called “Centers and Peripheries,” was originally made by geographer Denis Retaillé in 1992, but included in a 1994 volume on the “globalization of capital” by the economist François Chesnais. In his chapter “Counter Cartographies” Brian Holmes discusses the map.

This map shows three things. First, a circuit linking the United States, Western Europe and Japan, the so-called “Triad” regions, which form a “global oligopoly” accounting for the majority of industrial and financial exchanges. Second, the major nodes of the world network, represented by densely outlined circles. And third, the hierarchical relations between the regions, as described with these categories: center; periphery integrated to the center; annexed periphery; exploited periphery; abandoned periphery. Chesnais performs a Marxist analysis, showing how globally fragmented production lines are coordinated through the computerized circuits of the financial sphere. His map describes the hierarchy of social relations in a post-national era, when no political formation can erect any substantial barrier to the dictates of capital. And it reveals the near-perfect correlation between the graph of virtual flows and the geography of human exploitation.

I need to think about the relationship of these diagrams a bit more, but it’s as if one is predicting the existence of the other.


Post Spring Break Thesis Update

Posted: March 19th, 2012 | Author: genevieve | Filed under: Thesis | No Comments »

There have been some positive developments in the past few weeks, but since I spent my Spring Break in Austin I did not get as much work done on my thesis as I hoped to.

Access to Intraday Trading Data
I am now in touch with Knight Capital to get one day’s worth of exchange data. When I met with Marius Watz for my 1-on-1 meeting a few weeks back, he mentioned that he’d done a project for them, called Stockspace. He told me to drop his name and see if they’d share data with me like they did for him. Finally, after a few weeks of emails, they are working to get me data for a day’s worth of exchange activity, as well as one day’s worth of Knight trading activity. This will really help me get a feeling for the intraday behavior of the trading exchanges, which is especially important since high frequency traders buy and sell during the day, but try to leave their position “flat” at closing. Basically, this means they don’t want to end the day holding shares of stock whose values might drop. I hope to use Mark Hansen’s Data class to really explore the intraday data and see what kinds of patterns and visualizations emerge.

I may use this data to animate a similar landscape in Unity like I’ve been experimenting with this semester. Here is a video of the latest version, though still not where I envision it in the long run.

Hunting down data for Colocation Map
I spent a lot of my Spring Break hunting down data on the locations of trading exchanges, as well as colocation facilities that service the proprietary trading firms that engage in high frequency trading. Following Dave Boyhan’s suggestion, I did a lot of whois.com searches on the IP address from this not completely inclusive list of these prop trading firms. It returns Latitude and Longitude coordinates, but I’m not quite sure if they’re really accurate. I’m also torn about whether to list the trading firms at their official addresses, or to track down where they colocate. I will try and do both, but again, there’s a lot of private data so I’m also limited to what I have access to.

Spread Networks, which is currently the fastest fiber cable between New York and Chicago, making a roundtrip in 13.10 ms, updated their website recently and made a lot more of their locations public. They also posted a map of “amplification sites” along the way. This makes me really want to roadtrip to all of these towns, but I’m afraid that dream may have passed since I spent my Spring Break eating tacos in Austin.

Still, it’s great that they’re publishing more of their data. I’d like to contrast the Spread Networks cable with a few that are a tiny bit slower, but are much cheaper because of it. I’m assuming that the real HFTs use Spread Networks since even 1ms advantage is worth the cost in most cases.

In addition to the Klondyke Gold Rush map, I’m looking at some maps Stamen made of London, in which they visualized the relationship where they made a heat map that corresponded to the time it took to commute downtown.

GPS Spoofing
I’ve spent a while weighing the pros and cons of this since I last presented it in Thesis class a few weeks ago. I’m still fascinated by the extreme importance of time keeping in HFT, and the fact that it needs to be accurate down to the microsecond. I have ideas like making a Time Microscope or a High Frequency Time Machine, which would be imaginative objects that address the conflict of humans trying to experience computer time, or the lengths that traders might go to find that edge.

When I talked about the GPS Spoofer in Crit Group, I received interesting feedback that made me question why I wanted to make it. Abigail Simon said, the financial system is vulnerable (and problematic) due to larger issues than GPS timekeeping. I’m torn between exploring this technical facet vs. pushing some more conceptual ideas I have, especially about the relationship between finance and landscape/geography.

I’m still going to try and make the spoofer, and will hopefully have a report on that soon. I’m just not sure how to frame it. Is it a “guerilla weapon?” A sculpture? Do I have to test it in order for it to be worth doing?

Research Group Paper
I’m going to take the paper I’m writing in Heather Dewey-Hagborg’s Research Studio as an opportunity to write what I’ve been thinking about High Frequency Trading down in some kind of more formal way. I will write an update with more specific details, but I see this paper as both explaining High Frequency Trading to a general audience, as well as writing about the issues of time and space, geography and value that my thesis is exploring.


Research Studio Update Week 7

Posted: March 14th, 2012 | Author: genevieve | Filed under: Research Studio Algorithms, Thesis | No Comments »

This past week has been a whirlwind of speaking to experts and consultants about my research. In chronological order, these are the people I’ve spoken to with a few notes from our conversations.

Nancy Nowacek
Meeting with Nancy was wonderful. She immediately got my concept and was really good about offering references that she thought might be relevant. The first thing we discussed were waterlots, and a project we’d both seen at the CCA Curatorial MFA show in 2010 (I guess we’re both from the Bay Area). Sandra Nakamura makes installations with pennies that represent larger value connected with prices of land. For this piece she turned a grant from CCA for the amount it would have cost to buy the waterlot it sits on during the Gold Rush. She then turned this amount into pennies.
She also referenced the Propeller Group, a collective from Vietnam who were in the last Triennial at the New Museum. They did a project where they re-branded Communism, overlaying two opposing forces that makes the viewer confront the absurdity of the capitalist machine.
She told me to look at Carsten Holler’s work, since he approaches his artwork very grounded in his scientific background. She told me that the slides are about doubt, which is an interesting and non-interactive take on them.
Find a poetics about the technical
Try to make connections; 5-6 thought experiments
Do something that confronts the body – where the emotion lies
Look at Xavier LeRoy (choreographer) and Cassie Thornton, who has an excellent project where she turns people’s debts (bank statements, bills) into nuggets of paper mache gold, or bling.
Exercise to examine the core mechanics of a prop – light, heavy, bouncy. Emotion as an end goal.

Sean McIntyre
Sean McIntyre, a first year who does a lot of work with mesh networks, had a previous life as a high frequency trading programmer. It’s sort of the best case scenario, since he gets what we do here at ITP, and he’s not under an NDA like all current high frequency traders. He was nice enough to sit down with me last week and tell me a little bit about how the system worked (from his experience). He worked at Virtu, and did a lot of quality assurance, which was basically making sure that the algorithms worked properly before they “unleashed the beasts” (his words). He confirmed that they colocated their algorithms in three locations – Carteret, Weehawken and Seacaucus – most likely so that they could communicate with the trading exchanges nearby. At this point, NYSE didn’t have their Mahwah data center built yet (2008-2010).
GETCO was the company to beat
Bankruptcy in seconds (if things went bad)
Arbitrage across data centers
Citigroup stock was consistently in the top 5 for volume. Volume was the biggest indicator for HFT, much more important than closing price. They liked Citigroup cuz it was relatively cheap, and predictable.
In terms of liquidity rebates and transaction fees (other factors beside bid and ask spread that affect HFT algo decision-making), these are negotiated individually between the exchange and each trading company. This is one of the reasons that Virtu poached Chris Concannon, a former VP of Nasdaq, due to his connections and ability to negotiate better prices for their company.
Rebates are tiered according to a firm’s performance. More volume, lower fees
Chronos – their name for a time-based strategy
Algos usually incorporated multiple strategies
Not so sure about the HFT algos that lure others by buying and cancelling – Your can easily piss off an exchange by spamming them with buy/cancel orders
HFT has a data processing problem
Nasdaq exchange protocol = FIX protocol
Every exchange has a different standard of sending messages, need to figure out how to get them all to talk to each other
UDP
Messages from data center in 1 of 2 formats
Whole book or stock specific

Petter Kolm
I spoke with Petter Kolm from Courant Mathematical Finance Dept last Thursday, which has direct connections to Wall Street firms.
He told me that HFT algorithms are actually not that complex, just operate really fast
He suggested that I might model one simple system – and change parameters overtime
If volatility in the market goes up, algorithms become more aggressive
trading – sell-side activity, service to customers to minimize transaction costs – “agency algorithms, sell-side algos”
Aggressive HFT – one strategy is to pick off those large orders, and buy ahead in order to sell them the stock they want at a profit
Passive – place limit orders in the book
Limit – spread based on supply/demand
Aggressive HFT – instead of providing liquidity, you take it
Prop Trading Firms, Hedge Funds & HFT firms all employ different strategies at different frequencies
Market Making – they can be at the top of the limit order, buy low, sell high
colocated latency, 3-4 ms in exchange, longer if outside
Dark Pools – 30 dark pools exist
-they’re listed on the web
-just “another form of electronic trading”
-allow people who want to trade larger amounts of shares at once to execute them in one go, without the market seeing it and changing the price of the trades – “slippage”
-in dark pools, trades are marked at the midpoint between the buy and ask prices, don’t have to reach the ask.
-in reality, the avg size of a trade in dark pools isn’t as large as what they were designed to accomodate

Marius Watz
Very helpful, offered to put me in touch with Knight Capital Group, who he did dataviz for. They gave him a full day of intraday trading data for various stocks to visualize. He said I could use his name and perhaps they’d offer something similar.
He said the map sounded really interesting, referenced “They Rule”, Kevin Slavin, etc. Something he would “tweet”
He said to look at the Nanex crop circles and pick them apart with someone who might know what’s going on – they could be an opportunity to visualize

Tom Igoe
Went to office hours with Tom for his advice on the “understanding networks” angle to my research and project. He had some great ideas in terms of distilling my concept and how to proceed
Some notes from our meeting:
Not just actual locations, but distances/speeds in relation to how fast packets can travel. For instance, speed of fiber cable will connect the same two locations at different rates.
How has HFT changed the daily workflow for traders? (Trying to see if it has affects on human actors, or if things have changed over time because of it)
In addition to talking to quants, reach out to fund managers to see if this has changed the way they manage their team?
How might it affect a fund manager vs a specific stock trader?
If HFT injects liquidity it also injects volatility
Look at the GPS Spoofer article again with Higgs Boson debunk in mind – basically, they thought they find a particle that went faster than the speed of light but really it was an error in their GPS signal data
What do I need to ask a quant vs a trader?
Have the principles behind shorting changed (because of HFT?)
How have human trading patterns changed since HFT
Arbitrage – pure inter market arbitrage, other strategies
Tell me about some of the different methods that traders use, pattern of those methods over time, both manually and algorithmically

In addition to these first person sources, I have also been reading about human perception of time, and the time it takes to process actions and our consciousness of our actions. I read a chapter from The User Illusion, by Tor Norretranders, that described an experiment done by Benjamin Libet in which he attempted to determine the time and order of people’s consciousness of their own actions. Essentially, people react before their conscious brains do, so the whole idea of conscious action and agency generates from impulses in the body, where our brains explain them by saying they “wanted” to do something.

Another interesting thing the reading referenced is Wilhelm Wundt’s complexity clock – which is a clock that takes about 2.56 seconds to make a full rotation. People can still visually see the 3 ‘o clock, 7 ‘o clock (etc) spots around the clock, so that they can pinpoint smaller amounts of time more easily than just trying to sense what time it was when they made a decision.

I’m also reading up on whitepapers about GPS and different high frequency trading strategies, which I will summarize in another blog post.


Thesis Update

Posted: February 27th, 2012 | Author: genevieve | Filed under: Thesis | No Comments »

So in the past few weeks I’ve made some progress on my direction and what I plan to make by Thesis Week.

For the map / visualization of high frequency trading and the importance of location, I’d like to create something in the style of this map, or board game, from the time of the Alaskan Gold Rush.

I’ve made some headway and found a listing of proprietary trading firms, some of which specialize in high frequency trading. I’ve started to find all their locations, though some gaps are still there. I’m also trying to locate the data centers for all of the trading exchanges in the US as a starting point, then perhaps globally if I can manage.

I also started moving in a bit of a different direction. Just as I am interested in the importance of location for high frequency trading, I’m also interested in the importance of time. HFT exists in a timescale that is below the limits of human perception. There is an excellent article that a lot of people were kind enough to forward to me last week, which pinpointed the differences in algorithmic behavior at timescales below 650 milliseconds, which is the time it takes a chess grandmaster to realize a mistake. At this timescale, the algorithms are just trying to interact with one another as opposed to human traders. Most interesting, is that the fractal patterns that explain the market don’t hold up at microsecond timescales. As the researcher states, they “broke the fractal.”

This got me thinking about the behavior at millisecond increments of the market, but it also got me thinking about the importance of timekeeping in order to keep high frequency trading running. Then I came across this article, which REALLY got me thinking about how important timekeeping is to our financial system. From the GPS.gov website:

Each GPS satellite contains multiple atomic clocks that contribute very precise time data to the GPS signals. GPS receivers decode these signals, effectively synchronizing each receiver to the atomic clocks. This enables users to determine the time to within 100 billionths of a second, without the cost of owning and operating atomic clocks.

HFT uses a combination of a network timing protocol and GPS to keep time. The network protocol functions along the cable, while the GPS checks the accuracy on either end of the exchange. Since GPS receivers need to point at satellites, they are usually located on the roofs of trading exchanges. To read more about the methods for timekeeping in HFT, this is a great article.

The methods that the researchers achieved this are a bit out of my means, but I’ve found a few hacker articles which gave me hope that I could theoretically interact with the market by confusing the GPS signal. I plan to test this in a controlled environment, and construct a sculpture that would have the potential to cause disruption, without ever actually using it for that purpose.

So my new plan is to make a sculpture, that would theoretically shift the timestamp of algorithmic trades, and confuse the market. I would not use this, but I think making it points out the vulnerabilities in the system, and the reliance that our financial system has on technologies that we don’t realize are involved.

Lastly, here are some screenshots, and links to a video piece I made. It’s a landscape generated from financial data. I hope to refine this piece, but I’m not sure it’s quite in the same vein as the other pieces I plan to build for my thesis.


Research Studio Week 4

Posted: February 22nd, 2012 | Author: genevieve | Filed under: Research Studio Algorithms | No Comments »

Here are some thoughts on three papers related to the qualities of high frequency trading when analyzing them in terms of small increments of time at incredibly high speeds.

High frequency trading has been in the news as of late. People have been forwarding me a few insightful articles that led me to new journal articles. This well-written article referenced this paper:
Hasbrouck, J., & Saar, G. (2011). Low-Latency Trading, 10012(September).

It gave me a better understanding between what they call Agency Algorithms (which I believe other papers have referred to as ‘passive’ algorithms), and Proprietary Algorithms. Agency algorithms are used by large institutions when buying or selling many orders at once, in order to time them so as to “reduce slippage,” and keep as much profit on the order as they can. These algos still look to larger market trends, and might suggest to a human trader which stock(s) to buy or sell, but the trader would most likely determine volume, then execute the order via Agency Algos.

Proprietary Algorithms are what actually qualify as “low-latency algos” or aggressive high frequency trading. These try to game the speed of the system itself, baiting other algorithms to place an order, so that they can pounce and do it first. The patterns of these algorithms are a lot of buy-cancel-execute orders in milisecond periods of time, in an attempt to confuse the other algorithms out there and profit before they can.

Another article that many people have emailed me is Wired’s article on how HFT could negatively affect markets. The article was primarily a synopsis of this paper:
Johnson, N., Zhao, G., Hunsader, E., Meng, J., Ravindar, A., Carran, S., & Tivnan, B. (n.d.). Financial black swans driven by ultrafast machine ecology. Physics.

I found this paper incredibly compelling. The authors look at periods of time less than 650 ms, which is the threshold of human response time. As an example, they cite that 650 ms is the time it takes a chess Grandmaster to realize they are in trouble. This gives context to the transition away from “traditional human-machine systems,” where human oversight is possible if changes to the system are observable within human response time.

They describe the global financial system as governed by “the self-organized activity of a global collective of trading agents, including both humans and machine algorithms.” Since this system operates without much oversight or “real-time controller,” the study heeds researchers to develop a “scientific theory for the underlying human-machine ecology on these ultrafast timescales.”

They use the term “black swan” to describe events in the market that reflect extreme volatility, or jumps in pricing. Their definition: stock price had to tick down or up at least ten times before ticking up or down and the price change had to exceed 0.8%. They reference Francis Bacon in their interest in studying these “black swan events” as “it is in such moments that a complex system offers glimpses into the true nature of the underlying fundamental forces that drive it.” They also determine that the nature of black swan events change fundamentally as “the duration threshold is reduced beyond typical human reaction times.” The paper reflects a shift away from a market made up of mixed decisions between humans and machines, in which humans have time to asses information, to a system primarily governed by ultrafast machines dictating pricing.

These articles have made me intrigued by the “quantum” properties of HFT at the sub-second time level. In the same way that Newtonian physics gives way to different behaviors at the subatomic level, financial systems (generated from the interplay of programmed agents collectively displaying complex behaviors) have the similar properties at the sub-second level – namely at time periods below the threshold of human perception: 650ms. As Wired states, “While market behavior tends to rise and fall in patterns that repeat themselves, fractal-style, in periods of days, weeks, months and years, “that only holds down to the time scale at which human stop being able to respond,” said Johnson. “The fractal gets broken.”

These areas of inquiry lead me back to the first paper I read on this topic, when my interests were primarily on Colocation and the effect of distance on high frequency trading. I went back and re-read it, and some things were made clearer, and others not. There is a lot of math there, but here’s what the paper describes in my understanding.

Wissner-Gross, a., & Freer, C. (2010). Relativistic statistical arbitrage. Physical Review E, 82(5), 1-7. doi:10.1103/PhysRevE.82.056104

1. The speed of HFT – with typical trade latencies below 500 microseconds – have made the speed that information travels over distance relevant. Basically, firms are bumping up against a fundamental physical constant, the speed of light.
2. The paper calculates optimal nodes for communication with multiple exchanges.
3. “Within financial markets, the relevant time series are typically the logarithms of the prices (log-prices) of financial instruments.” (look into log prices)
4. They use the Vasicek model to describe the behavior of “correlated financial instruments.” Based on Brownian motion.
5. The optimal intermediate location simplifies to the two center locations weighted by speeds of reversion. The speeds of reversion scale with market turnover velocities. The optimal intermediate locations are midpoints weighted by turnover velocity.
6. Note that while some nodes are in regions with dense fiber-optic networks, many others are in the ocean or other sparsely connected regions, perhaps ultimately motivating the deployment of low-latency trading infrastructure at stuch remote but well-positioned locations.
7. “Such slowing or stopping of the propagation of pricing information due to arbitrage is somewhat analogous to the refraction and scattering of light by a dielectric medium, but novel in an econophysical context…This result also raises the possibility of establishing arbitrage analogs of other concepts from optics and acoustics, such as reflection and diffratction.” Perhaps there’s something here I can tap for an installation.

I’m still trying to determine how to tie what I’d like to build for Thesis, with the research, programming, and writing I’ll be doing in Algorithms, Research Studio.


3D Finance in Unity

Posted: February 21st, 2012 | Author: genevieve | Filed under: Unity3d | No Comments »

Miguel and I worked together to generate financial data as a three-dimensional landscape. The goal was to see whether a 3D environment would be a new or interesting way to explore a familiar (to some more than others) view of financial data — the daily stock charts. Conceptually, we wanted to connect the idea that the financial system is really an environment that we are all living within, and feeling the effects of. The thought was that data that evoked a natural landscape, like mountains or canyons, would make people connect the financial system to a “natural” system. Other artists have already made this metaphorical connection, like Michael Najaar, but we were curious about what the effects of rendering this in a manufactured 3D environment might be.

We began with an OpenFrameworks program that Miguel wrote to parse Yahoo Financial Data, and render it as a heightmap. We chose a window of historical stock data to analyze: Jan 1, 2000 – Feb 7, 2012, which we hoped would give enough of a range of time, but not be too long before certain tech stocks were publicly traded.

Our first thought of how to render the data so that it would make a terrain was inputting it into a Perlin Noise function. Miguel found a good tutorial for generating noise in C++, which also provided a better understanding of how noise works, and then wrote a nice program that parsed the data csv, and fed values into noise. We determined that the closing price each day and the volume of trades were the values we should take into consideration the most, and fed these values into noise.

The result makes for good terrain, lots of peaks and valleys, but I started to worry that it was a bit too random – that the comparison of multiple stocks would cease to have meaning.

I tried going about generating a 3D mesh straight from the data itself. I finally plotted the points, but although they ended up in 3D space it looked pretty linear (or planar I should say). I needed something that had volume.

Just to illustrate, here’s the data graphed in 3D, with time as the x value, volume as y, and closing price as z:

It’s a lot truer to the data, but it doesn’t give me the same terrain to play with as the noise heightmap. So, I started thinking about various ways I could achieve a similar result while staying “truer” to the data. I tried to figure out if there was a way to start with a base plane mesh, and then extrude along the y axis for every point that aligned with the x and z values. It turns out that an image is a good way to generate a planar mesh (at least in OF), so after talking with Greg, I decided to write up a quick processing sketch that input 2 columns of data, and output a grayscale image. This combined the grayscale heightmapping with the mesh technique of mapping the data points to a shared grid system.

To generate the grayscale image, my process was:
1. Get data to read into processing
2. Map data to width and height of image. x = time, y = closing price
3. Map volume data (what we want to extrude by) to color, 0 – 255.
4. Step through x and y, but plot volume as grayscale circles, radius getting smaller as color gets lighter (taller)

Here is the image result from this quick test:

After testing it in OpenFrameworks to render a 3D mesh, I decided that this was the direction to move forward with in Unity.

With this in mind, Miguel and I sat down to start generating the terrain in Unity. Working from the HeightmapGenerator from the Procedural Examples, we set out to create a financial data terrain programmatically. There might have been a better use of our time than trying to do everything in code, but it definitely made us a lot more familiar with the Unity docs and general structure of writing programs for this environment.

Unsure how much would be visible if we didn’t generate it programmatically, we erred on that side, but underestimated the time it would take to implement. Saving the scene wouldn’t work as we tested out placement and textures. The end result is a bit obvious, and looks sort of like a candy raver emerald city in the clouds, but I like it. Here are some screenshots.

Though not interactive, or even animated, the final result does use an external input to generate imagery. The interaction is primarily camera flyaround. Next steps like changing the terrain based on other financial forces (overall market trends, Dow Jones Divisor) might be interesting. The stocks featured are the 30 stocks in the DJIA – Dow Jones Industrial Average.


Financial Data Meshes

Posted: February 19th, 2012 | Author: genevieve | Filed under: Appropriating New Technologies, Thesis, Unity3d | No Comments »

Continuing from my previous post, I was able to get OpenFrameworks working to extrude the brightness values into the Y axis. I tried first with the noise images that Miguel and I generated at the beginning of the week.

Here’s the noise image in 2D:

And here it is extruded into 3D:

Here you can see them both, so you get the idea of how the brightness maps to height:

So, the form looks pretty good. But it also looks pretty random. By feeding financial data into a noise function, we can create a great looking terrain, but it starts to be so far removed from the data itself that it loses meaning. So, I tried to think about how I could generate mesh straight from the data. I finally plotted the points, but although they ended up in 3D space it looked pretty linear (or planar I should say). I needed something that had volume.

Just to illustrate, here’s the data graphed in 3D, with time as the x value, volume as y, and closing price as z:

It’s a lot truer to the data, but it doesn’t give me the same terrain to play with as the noise heightmap. So, I started thinking about various ways I could achieve a similar result while staying “truer” to my data. I tried to figure out if there was a way to start with a base plane mesh, and then extrude along the y axis for every point that aligned with the x and z values. It turns out that an image is a good way to generate a planar mesh (at least in OF), so after talking with greg, I decided to write up a quick processing sketch that input 2 columns of data, and output a grayscale image. This combined the grayscale heightmapping with the mesh technique of mapping the data points to a shared grid system.

To generate the grayscale image, my process was:
1. Get data to read into processing
2. Map data to width and height of image. x = time, y = closing price
3. Map volume data (what we want to extrude by) to color, 0 – 255.
4. Step through x and y, but plot volume as grayscale circles, radius getting smaller as color gets lighter (taller)

Here is the image result I got from this quick test:

I opened it up in Photoshop to do a little blurring, not sure if it was necessary, but didn’t want jagged steps between gradient values. Oh, and I also cropeed, which I should probably implement in code but oh well:

Final step was importing the image back into OF and generating a heightmap there. I used code from James George which made this pretty simple. I ran into a memory error, but I believe that had to do with pixel referencing or something. Here’s a little screencap video that I set to music so it’s a bit more enjoyable to just hang out in dataland. The glitches are due to my computer not enjoying recording and running the app at the same time. Perhaps I need to ramp down the framerate for next time, not that I mind it much.