The Neighborhood Network Watch Keyword Analysis Application (NNWKAA) v3.5

April 29th, 2008 by ecm292

The Neighborhood Network Watch Keyword Analysis Application (NNWKAA) is a text analysis application that is designed to look for words that are associated with terrorism or national security threats from raw network dumps from a packet sniffer. However, it is not meant to be an accurate tool by any means and is designed specifically to hyper-inflate the amount of perceived terror that is found in the network traffic dumps. This amplification of terror acts as both the fuel and pretense for the ideology that governs the fictitious group that uses the application, The Neighborhood Network Watch. It also is able to demonstrate that software, systems, and apparatuses are not necessarily objective entities that are infallible. Rather they are subjective and are governed by the entities that designed them and in turn reproduce the ideologies of those creating the application. In turn these apparatuses are able to reproduce the conditions for renewed production time and time again since the system is rarely questioned after it has been created, specialized, and made into a complex system.

How It Operates

The NNWKAA first breaks down the raw network dumps by attempting to strip out extraneous and useless information, which includes header info, HTML and CSS tags. Next the NNWKAA begins to look at each word within the network dump. At this point each word is checked to see if it is on a list of flagged words that are known to be associated with terrorism or national security threats. This list is known as the Neighborhood Network Watch Keyword List (NNWKL). This list is based off of an ECHELON word list that has been supplemented with data that has been scraped from the FBI and INTERPOL websites. If a word is flagged the application pulls the word preceding and following the flagged word. These words become contextual words that maybe eventually added into a separate word list, that acts as a supplement to the NNWKL, to allow the applications word lists to learn over time and automatically expand or initiate mission creep. If a word is not flagged it is checked against a dictionary to see if the word is indeed a word. From here the application then calculates the “Terror Percentage” which is the amount of words flagged to the total number of words found. Also, outputted are the counts for both flagged words and the total number of words. Based off of this percentage the application also generates a rating for the network traffic using a rating system that is similar to that of the Department of Homeland Security, both in verbiage as well as in description. Along with this rating a listing of the top 20 flagged words, also known as the hit parade, is then generated and displayed. At this point the contextual words that were found around the flagged words have probabilities calculated for them to see if they meet the minimum requirement to be added to the supplementary word list. In addition there is a minimum occurrence threshold as well. These words are then added to the supplementary word list and the statistical results and the hit parade are outputted to text files.

Demonstration

Here we have a demonstration video of what the NNWKAA looks like when it runs and how it runs. It for the most part is fairly opaque. Giving you tidbits of what might be going on but not really the whole story.

Source Code
Source Code Available for download here. It includes the standard keyword list and dictionary with it. It does not include any network dumps. The application can accept any type of text file not only network dumps.

A2Z Final - NNWKAA v3.5

March 31st, 2008 by ecm292

I will be continuing to work on the Neighborhood Network Watch Keyword Analysis Application (NNWKAA). The new things I will be working on will fall into the categories of text analysis and visualization.

Text Analysis

The Hit Parade

Currently all words that are flagged by the NNWKAA are never being seen by the public at all. In turn the Neighborhood Network Watch Keyword List (NNWKL) is kept entirely secret. However, since its second revision there was a quite large call for being able to see in some form or another either the whole keyword list or parts of it. Since, the NNWKL is supposedly a list that contains classified items that could potentially impact nation security if they were released to the public, it has been a difficult process to try and resolve this tension. This will be the first version that addresses this issue and it will be doing so with a listing of the top 20-30 flagged words. Essentially a “hit parade” of the top flagged words from the network traffic sample from that same network.

NNWKL Auto-Expansion

In version 3.5 there will be the introduction of either an automated way to expand the keyword list with words that are commonly found around flagged words or possibly make suggestions on words that should be added to the NNWKL. This will rely on generating markov chains for the flagged words, then taking a look at the probabilities for the words that may follow those flagged words. Only words that meet a set probability threshold will be nominated for inclusion into the NNWKL. These words maybe stored in an additional list or possibly get added into the NNWKL and then written out. If these words are stored in additional file the app will need to check to see if the supplemental file exists already. If not it will need to get generated. Whether or not to target all words or just the flagged words is not yet decided. Also, not decided is whether or not to also look at words preceding the flagged words.

Automated Network Threat Advisories

The introduction of including a suggested network threat advisory or rating for the network will be added into v3.5. The rating would be given in the standard color coded/verbal method, ie. Yellow - Elevated.

Visualization - The visualization aspect of the NNWKAA v3.5 will be similar to the look and feel of v2. The visual aspects are broken down into three categories: loading, analysis, and results.

Loading

The loading section will occur while the files are being loaded into their respective hash tables. It will include the following:

  • The name of the network dump (.cap) file that is being analyzed
  • Load/Progress bars for loading the NNWKL, the Carnegie Melon Dictionary, and the incoming network dump file

Analysis

  • Providing visual feedback of what the app is currently doing, ie “flagging” or “statistical analysis”
  • During word flagging possibly displaying the fact that a flagged word has been found, ie “FLAGGED”
  • Progress bars

Results

  • Hit Parade- display of the words found on the hit parade, possibly looping in sequential order with their counts
  • Total number of words that were flagged
  • The “Terror Percentage”- the percentage of flagged to non flagged words
  • Suggested Network Advisory aka rating

Auto NNW Text Generator

March 24th, 2008 by ecm292

So, I tried running 26 different pieces of text that describe various aspects of the Neighborhood Network Watch from their perspective that’s presented on their site. So, I went through trying to everywhere from 2-6 ngram levels to generate these new texts.

2 ngrams

Proprietors of Homeland Security’s (DHS) National Archives and data that the 1960’s.
The Neighborhood Network Watch becomes the Network Watch investigates.
All networks of the first conceived with the community.
Interpret – Protect our website.
Forms that contains any terrorist activity and press on captured network traffic or threatening network traffic.
This is always looking to form a part of public networks.
All networks included were also presented the participating countries.
These optional items can pinpoint areas that have been browsed.
The first Neighborhood Network Watch Information Infrastructure Protection Act.
Lastly we read pages that is not only share the spread of methods and such.
It is plain and the advisories issued, for later analysis.
Information If you browse, read pages that include the Neighborhood Network Watch Wordlist.
It makes use the Neighborhood Network Watch.
He had a part of 2007 collections.
These are made a contributing member posing the work of the requirements of 2007 Findings.
Most commonly this creature is constantly being much more precise locational awareness.
Lobsters have had a smarter “spotter” or wireless routers as an outside website(s).
Some cookies collect personal information about what the Spring 2007 findings of the website and zones.
This information vital to our website.
Proprietors of using the future.
Martin presented the networks included information listed above with the Spring 2007 Findings.
As well as to look at times.
The Situational Assessments Statistical Analysis Application (NNWKAA).
This information listed below: Soy Luck Club Starbucks- Greenwhich Starbucks-W.
This data from occurring within the types of the Neighborhood Network Advisory System.
Proprietors of the ability to incorporate new technology used by the scope of visitors use.
When you visit our nation and began operations budgets.
The Neighborhood Network Watch does not a linked websites.
The NYC Chapter’s Fall 2007 findings of Homeland Security and its mission or block radius.
By navigating to be in order to look at the Neighborhood Network Advisory System.
It makes use that are concerned especially from the East Village below 14th Street.
Each cell captures that the 1960’s.
We reserve the geo-spatial awareness and INTERPOL.
The Neighborhood Network Watch site more information on a blanketed regional level as ECHELON.
These include Los Angeles, the benefit of data collection samples taken.
Most commonly this service remains available to establish geo-spatial placement of diversity continues.
Green Zone 1 and Greenwich Village Areas.
We reserve the Neighborhood Network Identification and press on Terror.
Emissary to establish situational awareness of 2007.
The first official operations budgets.
Unauthorized attempts to instant messages, web sites, images, etcetera.
A device with another government computer system employs commercial marketing.
The Neighborhood Network Watch Network Watch Minority Ethics Commission issues these links as Starbucks.
This data to maintain its world through silt.
This application developed for different networks.
These cookies collect personal information when your computer’s hard drive and are used for information.
For site and have seen improved yields.
The data and to your request for visiting the East Village areas south of them.
Examples of words that can be underestimated, in the Network Watch.
One new wordlist on behalf of the public networks was conceived with the Neighborhood Network Watch.

3 ngrams

Collect – Collect large amounts of specialized knowledge, training, maintenance and generally much higher operating costs.
The section below explains how we handle and collect technical information when you visit our website.
Once these regions are identified they can be obtained relatively easily for relatively low costs.
With such low entry barriers to conducting operations on behalf of the technology section.
This small green zone of low terrorist activity is a paramount concern.
We are also concerned about Green Zone 1.
The third member may serve as a smarter “spotter” or wireless network detector.
This is the Neighborhood Network Watch Wordlist is a particularly dangerous region.
This area is the current list and links to these presences: -Facebook -MySpace -YouTube -Google Video
So, far in comparison with the U.S. Department of Homeland Security Advisory System.
It makes use of two main software applications.
The first operations from the DHS, DoD, and NSA currently.
Forms that can function as a smarter “spotter” or wireless network detector.
Most of these networks from October to November 2007.
Some of the technology section.
The section below explains how we handle and collect technical information when you visit our website.
All of the types of technology our visitors use.
The Neighborhood Network Watch NYC fall 2007 findings.
In Summary we have looked at twenty-eight different networks.
The first is TCPDUMP and the Data Analysis Division (DAD) for analysis.
The following results are the East Village, West Village, Greenwich Village and SoHo areas.
The NYC Network Identification and Collection Division (NICD) collected network traffic from them.
One audience member asked about what type of encryption if any is used.
Lastly we need to explore more thoroughly rooting out the sources of terrorist network activity etcetera.
Most commonly this includes, emails, instant messages, to websites browsed, essentially everything and anything.
Proprietors of the technology used by NICD teams please visit the technology section.
These cookies collect personal information to the Strategic Value of Web 2.0.
This is some eye-opening stuff.” A device with this information by direct contact.
The main benefits are being transferred over networks.
Most commonly this includes, emails, instant messages, web sites, images, etcetera.
The TCPDUMP application cannot only see all this information by direct contact.
As you can see here Orange Zone 1 and to generate definable regions and zones.
Interpret – Discover trends in the future.
He also presented the preliminary NYC Chapter’s Fall 2007 Findings.
We concerned in the future.
This is the East Village below 14th Street, south of 14th Street.
Cookies When you visit our website.
We are also concerned about Green Zone 1.
As well as pay for use networks, such as Starbucks.
The Neighborhood Network Watch would be founded in New York City Area in April 2007.
It seems only appropriate that the first official operations the Neighborhood Network Watch.
He also presented the preliminary NYC Chapter’s Fall 2007 findings reports.
For career information please visit the career section.
This is the current list and links to these presences: -Facebook -MySpace -YouTube -Google Video
Union Square Park and Greenwich Village areas seems to be the first Neighborhood Network Watch.
He had envisioned a community-based group that would monitor our nations public networks.
However, due to the privacy and security policies of the linked website.
Martin presented general information about your visit.
This information is only used to fulfill your request for information.
Interpret – Discover trends in the areas south of 14th Street.

4 ngrams

Especially so in the highly technologically mediated world we live in today.
If you have any additional questions please email us at contact@dhsnnw.org
A campaign aimed at rooting out the sources of terrorist or threatening network traffic.
This information never identifies who you are.
This information is only used to help us get you the information you have requested.
We only share the information you have requested.
This information may also be shared if there is a potential concern regarding national security.
We reserve the right to create individual profiles and give these profiles to any governing body.
The Neighborhood Network Watch Wordlist.
He also presented the preliminary NYC Chapter’s Fall 2007 Findings.
The first operations from the spring of 2007 were made public in May of 2007.
This information never identifies who you are.
This information is only used to fulfill your request for information.
The section below explains how we handle and collect technical information when you visit our website.
These additions have included information and data sets from the FBI and INTERPOL.
The Neighborhood Network Watch Linking Policy.
Prohibitions Neighborhood Network Watch when your browser is open.
In addition we collect some technical information when you visit our website.
It makes use of two main software applications.
Currently the group has begun to look at the Spring 2007 collections.
These networks have had a second set of collection samples taken.
In all twenty-eight networks were profiled.
The networks are listed below: Soy Luck Club Starbucks- Greenwhich Starbucks-W.
Union Square Park Tea Spot Washington Square Park-missing Starbucks- Washington Sq.
So, far in comparison with the previous wordlist we have seen improved yields.
Most commonly this includes, emails, instant messages, and web pages that have been browsed.
In fact we can even reconstruct images transmitted.
In fact privacy is a relative term with a definition that is constantly being redefined.
Especially so in the highly technologically mediated world we live in today.
If you have any additional questions please email us at contact@dhsnnw.org
Neighborhood Network Watch when your browser is open.
In addition we collect some technical information when you visit our website.
It is also actively recruiting members for the work of the Neighborhood Network Watch.
Emissary Martin, went on to clarify that the Neighborhood Network Watch.
This application is employed by the Neighborhood Network Watch.
He also presented the preliminary NYC Chapter’s Fall 2007 Findings.
These roles maybe carried out by a single individual at times.
The third member may serve as a driver or secondary “collector,” “spotter,” or a dedicated pathfinder.
Cookies When you visit some websites, their webservers generate pieces of information known as cookies.
Some cookies collect personal information to recognize your computer in the future.
This is the case at Neighborhood Network Watch, where we use persistent cookies.
These cookies collect personal information to recognize your computer in the future.
This is the current list and links to these presences: -Facebook -MySpace -YouTube -Google Video
A triumph for the safety of our nation and our communities.
As well as helping to build stronger communities and a stronger nation.
Most commonly this includes, emails, instant messages, and web pages that have been browsed.
In fact we can even reconstruct images transmitted.
In fact privacy is a relative term with a definition that is constantly being redefined.
Especially so in the highly technologically mediated world we live in today.
If you have any additional questions please email us at contact@dhsnnw.org

5 ngrams

These optional items can be obtained relatively easily for relatively low costs.
With such low entry barriers to conducting operations on behalf of the group.
For career information please visit the career section.
The light from these cells is then focused to form a single, intensified image.
Neighborhood Network Watch Keyword Analysis Application and Neighborhood Network Watch Wordlist.
He also presented the preliminary NYC Chapter’s Fall 2007 Findings.
Most commonly this includes, emails, instant messages, and web pages that have been browsed.
In fact we can even reconstruct images transmitted.
In fact privacy is a relative term with a definition that is constantly being redefined.
Especially so in the highly technologically mediated world we live in today.
If you have any additional questions please email us at contact@dhsnnw.org
A campaign aimed at rooting out the sources of terrorist or threatening network traffic.
It seems only appropriate that the first chapter of the Neighborhood Network Watch.
Neighborhood Network Watch provides these links and pointers solely for our users’ information and convenience.
The Neighborhood Network Watch Keyword Analysis Application (NNWKAA).
This application is employed by the Neighborhood Network Watch.
One new technology recently released by Apple Inc. is the iPhone and the iPod Touch.
Proprietors of the networks are typically alerted with this information by direct contact.
The Neighborhood Network Watch is firmly committed to being a contributing member of the community.
The Neighborhood Network Watch makes use of two main software applications.
Neighborhood Network Watch provides these links as a service to our users.
So, far in comparison with the previous wordlist we have seen improved yields.
A campaign aimed at rooting out the sources of terrorist or threatening network traffic.
Some of the networks included were also part of the Spring 2007 collections.
These networks have had a second set of collection samples taken.
In all twenty-eight networks were profiled.
The networks are listed below: Soy Luck Club Starbucks- Greenwhich Starbucks-W.
Union Square Park Tea Spot Washington Square Park-missing Starbucks- Washington Sq.
So, far in comparison with the previous wordlist we have seen improved yields.
With these correlations and trends regions are identified within the community/communities where these networks are situated.
A campaign aimed at rooting out the sources of terrorist or threatening network traffic.
The first operations from the spring of 2007 were made public in May of 2007.
Currently the Neighborhood Network Watch investigates.
If you have any additional questions please email us at contact@dhsnnw.org
It is also actively recruiting members for the work of the Neighborhood Network Watch.
He had envisioned a community-based group that would monitor our nations public networks.
These optional items can be obtained relatively easily for relatively low costs.
With such low entry barriers to conducting operations on behalf of the Neighborhood Network Watch.
One new technology recently released by Apple Inc. is the iPhone and the iPod Touch.
Neighborhood Network Watch provides these links and pointers solely for our users’ information and convenience.
In Summary we have looked at twenty-eight different networks.
The section below explains how we handle and collect technical information when you visit our website.
This information may also be shared if there is a potential concern regarding national security.
A campaign aimed at rooting out the sources of terrorist or threatening network traffic.
Neighborhood Network Watch Keyword Analysis Application and Neighborhood Network Watch Wordlist.
The Minority Ethics Commission is here to make sure that this sort of diversity continues.
This information may also be shared if there is a potential concern regarding national security.
The Neighborhood Network Watch Network Threat Advisory System.
Proprietors of the networks are typically alerted with this information by direct contact.
With these correlations and trends regions are identified within the community/communities where these networks are situated.

6 ngrams

We collect personal information like names or addresses when you visit our website.
This information never identifies who you are.
This information is only used to help us make the site more useful for you.
Cookies When you visit some websites, their webservers generate pieces of information known as cookies.
Some cookies collect personal information to recognize your computer in the future.
This is the case at Neighborhood Network Watch, where we use persistent cookies.
This information may also be shared if there is a potential concern regarding national security.
We reserve the right to create individual profiles and give these profiles to any governing body.
The Neighborhood Network Watch never collects information for commercial marketing.
Neighborhood Network Watch provides these links and pointers solely for our users’ information and convenience.
The Neighborhood Network Watch is always looking to incorporate new technologies into its everyday operations.
The Neighborhood Network Watch becomes the eyes and ears within the networks of America’s communities.
It is based though on how lobster’s eyes function to see through silt.
The light from these cells is then focused to form a single, intensified image.
So, far in comparison with the previous wordlist we have seen improved yields.
Some of the networks included were also part of the Spring 2007 collections.
These networks have had a second set of collection samples taken.
In all twenty-eight networks were profiled.
The networks are listed below: Soy Luck Club Starbucks- Greenwhich Starbucks-W.
Union Square Park Tea Spot Washington Square Park-missing Starbucks- Washington Sq.
So, far in comparison with the previous wordlist we have seen improved yields.
Using this wordlist as a basis, the Neighborhood Network Watch has carried out.
A triumph for the safety of our nation and our communities.
Most commonly this includes, emails, instant messages, and web pages that have been browsed.
In fact we can even reconstruct images transmitted.
In fact privacy is a relative term with a definition that is constantly being redefined.
Especially so in the highly technologically mediated world we live in today.
If you have any additional questions please email us at contact@dhsnnw.org
To learn about the technology used by NICD teams please visit the technology section.
He also presented the preliminary NYC Chapter’s Fall 2007 Findings.
One new technology recently released by Apple Inc. is the iPhone and the iPod Touch.
This information may also be shared if there is a potential concern regarding national security.
We reserve the right to create individual profiles and give these profiles to any governing body.
The Neighborhood Network Watch never collects information for commercial marketing.
Neighborhood Network Watch provides these links and pointers solely for our users’ information and convenience.
Neighborhood Network Watch provides these links as a service to our users.
The Neighborhood Network Watch is not responsible for transmissions users receive from linked websites.
Users must request such authorization from the sponsor of the linked website.
As well as helping to build stronger communities and a stronger nation.
Most commonly this includes, emails, instant messages, and web pages that have been browsed.
In fact we can even reconstruct images transmitted.
In fact privacy is a relative term with a definition that is constantly being redefined.
Especially so in the highly technologically mediated world we live in today.
If you have any additional questions please email us at contact@dhsnnw.org
The Minority Ethics Commission is here to make sure that this sort of diversity continues.
The Neighborhood Network Watch becomes the eyes and ears within the networks of America’s communities.
Most commonly this includes, emails, instant messages, and web pages that have been browsed.
In fact we can even reconstruct images transmitted.
In fact privacy is a relative term with a definition that is constantly being redefined.
Especially so in the highly technologically mediated world we live in today.

Midterm - State of the NNWKAA

March 8th, 2008 by ecm292

For the midterm I continued to work on the Neighborhood Network Watch Keyword Analysis Application (NNWKAA). The current version of the NNWKAA (v3) is for all intents and purposes a complete rewrite of the previously used NNWKAA, that was built in the processing environment. I will now go through the major differences and changes with the program.

Previously the NNWKAA made use of strings and string arrays for the handling of all the incoming data from the cap files and the Neighborhood Network Watch Keyword List (NNWKL). The NNWKAA v3 now handles all data with hash tables and string buffers, making the application as a whole much more efficient with memory as well as speeding it up significantly. The analysis times have improved dramatically over the previous, I do not have timed results currently but at some point more than likely a time comparison will be made, but for now, it can be said that it is way faster than the older version.

The NNWKAA v3 has moved away from a long form single object app into a fully object oriented java app and has thus has helped with the handling and compartmentalizing of data. The NNWKL is the latest version that uses the ECHELON wordlist as its backbone and FBI / INTERPOL resources as well. There has been an addition of a dictionary cross check, that checks to see if words are actually valid words and if not ignoring them, since with network traffic dumps there is a significant amount of useless garbage. This has dramatically changed the results since it is now very effectively removing much of this garbage and gibberish along with the introduction of an HTML / CSS tag remover / stripper, for the removal of HTML and CSS tags. These have reduced the amount of extraneous formatting information so the exact contents of emails, IMs, and web sites can be checked more effectively for threats to national security and terrorist related items.

So, far with just the introduction of the dictionary cross checking and the HTML / CSS stripper we have seen a marked increase in the amount of flagged words, upwards of 16% in some test cases thus far. Multiple types of reports can be outputted including excel spreadsheets of the concordance of the incoming cap file, with the word counts, flagged status, and positions. Dictionary status could also be easily implemented. A final statistics report is generated with the name of the incoming file, the number of flagged words, the total amount of words, and the terror percentage. A reworking of the Network Threat Advisory System criteria will be necessary to address the exponential changes in the amounts of flagged words, therefore a suggested Network Threat Advisory Level is not being included in the reports that are generated.

NNW Network Threat Advisory System

Here is the source code for the nnwkaa main, the cap file loader, the generic file loaders, and the modified word class. Here are also some example results, B-Cup Café Results & Excel File (very large 25mb) and a result doc from the previous version of the NNWKAA, NYU’s Stern on the Move network, and 2 networks that are in my apartment building or in neighboring buildings, Hot Air Balloon Results & Excel, and Netgear Results & Excel.

NNWKAA 3.0

February 26th, 2008 by ecm292

The Neighborhood Network Watch Keyword Analysis Application (NNWKAA), is an application I began writing back in the spring of 2007 in the Every Bit You Make class, for what would eventually become my thesis project The Neighborhood Network Watch (NNW). First what is the NNW? Here is the mission statement for the group:

“The Neighborhood Network Watch is a community based organization that seeks to safeguard our nation’s public networks by establishing awareness of network traffic that may compromise or otherwise be detrimental to national security, through means of data collection and analysis.”

The NNWKAA is used to analyze the raw data dumps from packet sniffing of public networks. The program uses the Neighborhood Network Watch Keyword List (NNWKL) to look for flagged words that may indicate threats to national security and or terrorism. The NNWKL is based off of the ECHELON word list (used for joint US, UK, Canadian, New Zealander and Australian (AUSCANZUKUS) signal intelligence collection and analysis), along with contemporary updates from FBI and INTERPOL resources. This first version the NNWKAA had no visual output or interface, with all statistics being displayed in the console, and using a single pass for checking every single token for a keyword match. The program as a whole was highly inefficient, slow, and limited.

A new version of the NNWKAA app began development in the fall of 2007, with the addition of standardized text output files for the results, a graphical output and feedback display on its current state of processing the data. The code was cleaned up considerably however it still remained slow and inefficient.

For the third version of the NNWKAA. I plan to implement either hash table or tree maps to speed up the look up speed from the current version that must traverse the whole list of the NNWKL. Implementation of string buffers as opposed to using straight strings to tidy things up and to prevent the extraneous generation of strings. The introduction of an HTML/CSS tag stripper, to attempt to remove some of the extraneous information that is inherrantly found in network dump files (.cap files). I would like to implement a dictionary to check to see if tokens that are not found in the NNWKL are indeed words as an attempt to stave of jibberish that is often found in dump files that maybe diluting the statistical results towards lower amounts of terror. Potentially tracking of the most common flagged words, 10-20. Possibly contextual cross checking and self learning and growing of the wordlist itself. Improved graphical visual output along with easier more automated way for inputting files into the NNWKAA for analysis. This is the current road map for the third version of the NNWKAA.

For the midterm I plan to work on the implementation of the hash table/tree maps for folding the NNWKL as well as the usage of string buffers. Potentially the dictionary cross checking. The HTML/CSS Stripper. Better way for inputting files into the application instead of being hard coded. Possibly also the counts for the number of times a flagged keyword is found.

NNWKAA 3.0 Preliminary Diagram

Bayesian with tcpdumps & the Plan for NNWKAA Overhaul

February 19th, 2008 by ecm292

So, most of what I did this week was take a look at how the bayesian code was built and tried running through cap (tcpdump) files that I had analyzed previously with the Neighborhood Network Watch Keyword Analysis Application (NNWKAA). This proved difficult since I had to work from my laptop this week as my main machine (PMacG5) went down yet again, so less RAM and less CPU processing power = lots of running out of memory and just not even running. However, I was able to dig out some really small .cap files that had been run through the NNWKAA before, those worked.

So, I fed the system one file that had been given a rating of High, for the amount of potentially dangerous activity (threats to national security and/or terrorist related) found being accessed or transmitted from the given network. Another file that had been rated Guarded, since all the networks that had been rated Low where to large to be processed. The test file to be processed was first a file that had very overt and large amounts of terrorist related or national security related network activity. The result was not spam or non-threatening. Report results.

After this I additionally added the Neighborhood Network Watch Word List (NNWWL), to supplement the bad side of things or spam side. This resulted in a marked jump in the flagged words and saw the label change from non-spam or non-threatening to spam or threatening. Report results.

The third version saw the introduction of an HTML and CSS tag stripper element to process the incoming training files and the input file. This maintained the spam/threatening result with increasing the threat level for keywords found in the NNWWL. Report results.

The last version contained all of the above but a new test cap file was generated to include fairly benign web browsing, such as browsing popular news sites, Facebook, sending and receiving email, Wikipedia usage, weather checking, and some Amazon browsing. The file after being run through the program was found to be spam/threatening, which was odd, but not so odd in some ways. Here are the last versions of the code for the Bayesian and the Spam Filter. Report results.

This brings me to the new revamp of the NNWKAA that I plan to work on during the course of this class. The biggest aspect is making it more efficient and reducing the cross checking time and making it a bit smarter and to remove more of the gibberish that is often found in cap files from network traffic dumps. Primarily the application would take a look at the NNWWL and store this into a hash or tree and then input a dictionary into another hash or tree. The input file would then take the incoming cap file and break that down and attempt to remove tags and common junk from the file and place individual words into a hash or tree. From here it would then compare its contents for matches with the NNWWL, if matched it would get flagged and a counter would be incremented and then go onto the next word. If false it would then move to the dictionary and check to see if it is found in there, if not it then will either be stored to see if it occurs multiple times and a threshold would be set for it to reach before being purged. After this is done all things that don’t meet the criteria of either NNWWL, dictionary, or minimum of occurrences would then be purged and the resulting information would then have various calculations done. First a total count of the flagged words along with the top 10-20 occurring words. Second the total word count. Third would be the percentage of flagged words to total found words. Then after this the likely or suggested NNW Network Advisory, (Severe-Red, High-Orange, Elevated-Yellow, Guarded-Blue, Low-Green). Here is a preliminary crude breakdown of the process and flow of the new NNWKAA new-nnwkaa-programming-diagram.png .

Week 3-Concordance of a .Cap

February 12th, 2008 by ecm292

So, now after no longer having the out of memory heap problems that I had with loading .cap tcpdump files I now decided to run some cap files through. The program basically first strips out any HTML tags that are in the .cap file, the easiest thing to remove since there’s a lot of garbage and its nigh on impossible to sort out. Then after this it goes through and propagates the tree and takes the count for the occurences. After this it then writes out the words to a string buffer that’s then dumped out to a string and then out to a text file. There is a ton of things that are not words hence the 9000 or so unique things loaded into the concordance. The next useful thing would be to compare this with both a dictionary and the NNW keyword list to check first if an entry is in the dictionary and if not if it is in the NNW keyword list. If not then it would need to be purged in some way. So, here’s the code and here’s a small snippet of that very long text file.

Some from the e’s

elj 1
elq 1
else 37
elv 1
em 20
email 6
embed 2
emr 2
en 61
enabled 4
encj 1
enclosed 1
encoding 44
encourage 1
encyclopedia 2
end 6
ending 2
endlessnight 1
endpos 4
enhanced 1

Inverse HTML Tag Stripper

February 5th, 2008 by ecm292

Instead of removing the html tags, I was trying to get everything between anything that could potentially constitute a tag your standard “>” and “<”.  So, here’s this done on an article off the bbc.  However, I have not figured out how to just dump out from the matcher directly back into a buffer or string.  I’m sure it’s something really bloody simple.  I was going to run it through a second pass to remove the actual >’s and <’s but as I couldn’t just pass the matched stuff from the first matcher into a form for the second matcher to affect it.  Here is the source code.assign 

A-Z Assignment WK1

January 29th, 2008 by ecm292

Here I am running Assign1b in the terminal on the Bible. The program looks for a keyword in this case “the.” At this point it then takes a look for the first letter of the next word and outputs this. I forgot to actually just toss in an input argument so you can type in a keyword to be looked for, easily done later. Running Assignment 1It prints out in the console the word first first then the first letter of the word.  The first letter of the word is retained and dumped to the text file. Assign1b-OutputHere is the NY Times output.  NY Times Output-Assign1bThe source code can be found here

A-Z Dumping Ground

January 25th, 2008 by ecm292

So, this is going to become the dumping ground I think for Programming A-Z.