Removing Point Outliers

November 19th, 2007 rupert Comments off

In my previous post to remove point outliers, I tried using R and PLR in PostGres. Although, I only scratched the surface on the spatial analyzing capabilities of R, I needed something more extensible for my internet purposes. I decided to use Python’s pragmatic benefits and ease in programming. Idea was to pull out the vector points from PostGIS, process it using an algorithm (ideally minimum convex hull but it could be expensive later on) and then remove the outliers.

Numpy, a scientific python library, blends easily by using basic functions for mathematical array computations such as mean, median, standard deviation and variance. For now, the algorithm takes a 90% threshold, taken from “Dealing with ‘Outliers’: Maintain Your Data’s Integrity”

Consider this collection of 10 scores, sorted from smallest to largest:
 
  x    8 25 35 41 50   75 75 79 92 129
                     ^
The median of these 10 values of x is 62.5, computed as (75+50)/2.
 
Next, calculate the absolute value of the deviation of original data from median:
 
   x     med  abs_dev
 
  50    62.5    12.5
  75    62.5    12.5
  75    62.5    12.5
  79    62.5    16.5
  41    62.5    21.5 ->|
  35    62.5    27.5 ->|  MEDIAN(abs_dev) = 24.5 = (21.5+27.5)/2
  92    62.5    29.5
  25    62.5    37.5
   8    62.5    54.5
 129    62.5    66.5
 
Next, compute a test statistic which is the column of absolute values computed above, divided by the mediate of the absolute values:
 
Test Stat = abs_dev / (Med of abs Dev)
 
                           Med of       Test
  x    Median  abs_dev    abs dev    Statistic  Outlier?   
 
  8     62.5     54.5       24.5      2.22449
 25     62.5     37.5       24.5      1.53061
 35     62.5     27.5       24.5      1.12245
 41     62.5     21.5       24.5      0.87755
 50     62.5     12.5       24.5      0.51020
 75     62.5     12.5       24.5      0.51020
 75     62.5     12.5       24.5      0.51020
 79     62.5     16.5       24.5      0.67347
 92     62.5     29.5       24.5      1.20408
129     62.5     66.5       24.5      2.71429       Yes
 
The decision rule then is to compare this test statistic with an arbitrary cutoff point. A cutoff of 2.5 is conservative; 4.5 or 5 is more rigorous. If the Test Statistic > Critical value (=2.5), then define the observed value as an outlier. According to this cutoff value, the data above have one outlier (x=129).

Implementing this in Python…

P = 116.32977 39.905319,116.329906 39.90464,116.329907 39.90464,116.329918 39.904675,116.330047 39.904683

    multipoints = getPointsString()
    print multipoints
 
    pobj = getPointArray(multipoints)
    p = pobj.p;
    x = pobj.x;
    y = pobj.y;
 
    #print "Median:",  median(p)
    #print "Std:",  p.std(axis=0)
    #print "Min:", p.min(axis=0)
    #print "Max:", p.max(axis=0)
 
    pmed = median(p)
    pdev = p - pmed
    pdev_abs = abs(pdev)
    med_pdev = median( pdev_abs )
    pfinal = pdev_abs / med_pdev

Where getPointsString() = “116.32977 39.905319,116.329906 39.90464,116.329907 39.90464,116.329918 39.904675,116.330047 39.904683..” a list of point geometries. We can easily get the median, std, and even minimum (min) and maximum (max) values in the array.

2007-11-19_102428.png

Here the original dots are marked as red, while the final dots after removing the outliers were colored as green.

Categories: python Tags:

Geometric Algorithms in GIS

November 16th, 2007 rupert Comments off

Here is a couple of Geometric Algorithms used in GIS.

  • Convex hull problem: for a set of points, determine the smallest convex set that contains all.
  • Line segment intersection: for a set of line segments, determine all intersections.
  • Voronoi diagram computation: for a set of points, determine the subdivision of the plane into cells such that inside some cell, one and the same point of the set is closest.
  • Delaunay triangulation: for a set of points, determine a planar subdivision by creating edges between the input points in such a way that no two edges intersect, all faces are triangles, no more edges can be added with the given constraints, and no circumcircle of any triangle contains an input point in its interior.
  • Minkowski sum: for two simple polygons P and Q, compute the shape that consist exactly of the sum of all points of P and all points of Q, where sum is interpreted as the vector sum.
  • Rectangular range search: for a set of points in the plane, design a data structure on those points, such that for every axis-parallel query rectangle, all points in the data structure that lie in the query rectangle can be reported efficiently. Algorithms are needed for the construction of the data structure and for the execution of a query.

2007-11-16_062422.png

Reference:
M Kreveld, Computational Geometry: Its objectives and relation to GIS, Institute of Information and Computing Sciences, Utrecht University

Categories: GIS Tags: ,

Installing R on Windows and Debian

November 16th, 2007 rupert Comments off

‘R’ is a statistical package. For an overview, please go to www.r-project.org
My intention was to remove the point outliers from a given set of point geometries.

I just recently installed R both on my Windows XP and Debian. Regina’s www.bostongis.com is an excellent tutorial in getting involved with R. I do suggest you head first to PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot’s Guide to get you started.

The install instructions for Windows works flawlessly. I have to hold back to R-2.5 though as I plan to use RPy (Python for R), see details below. To install ‘R’ in Debian, there’s a couple of settings that we need to take care of…

1. Install r-base
sudo apt-get install r-base

2. Install plr on postgres
sudo apt-get install postgresql-8.2-plr

3. Using R in a database
psql -d beijing -U lbs -h 127.0.0.1 < /usr/share/postgresql/8.2/plr.sql

4. Set the R_HOME environment variable
/etc/postgresql/8.2/main/environment
R_HOME='/usr/lib/R'

5. Restart Debian.

RPy, R for Python, is another alternative to use R in Python. I installed it both in Windows and Debian. Note that I reverted to R-2.5 on Windows to be compatible with RPy. For Debian, Im currently using R-2.6.

For the Windows Binary Installation,

1. Read the RPy Main Site

2. Install prerequisites:

- NumPy
- Win32 Extensions Download

3. Afterwards, install the main package, RPy Download

In Debian, its a straight forward…sudo apt-get install python-rpy

Categories: debian, postgis, postgres Tags: ,

Serving ASP pages in Linux

November 14th, 2007 rupert Comments off

I never intended to do such a thing as what the title describes. However, since we need it at work temporarily, I have to crack up my linux skills to set this up. Principal reference is http://www.apache-asp.org/config.html.

In Debian,

1. install libapache2-mod-perl2 + libapache-asp-perl


sudo apt-get install libapache2-mod-perl2
sudo apt-get install libapache-asp-perl

2. configuration includes:
sudo vi /etc/apache2/sites-available/default

 76     PerlModule  Apache::ASP
 77      <files>
 78        SetHandler  perl-script
 79        PerlHandler Apache::ASP
 80        PerlSetVar  Global .
 81        PerlSetVar  StateDir /data/asp
 82      </files>

3. Restart apache.

4. Make sure you have the correct permissions to: /data/asp

drwxrwxr-x 4 www-data www-data 4096 2007-11-13 15:33 asp

5. If you encounter the problems:

[Tue Nov 13 15:12:36 2007] [error] [client 127.0.0.1] Can't locate object method "get" via package "APR::Table" at /usr/share/perl5/Apache/ASP.pm line       2016.\n at /usr/share/perl5/Apache/ASP.pm line 2016\n\tApache::ASP::get_dir_config('APR::Table=HASH(0x81d96f8)', 'Global') called at /usr/share/perl5/A      pache/ASP.pm line 275\n\tApache::ASP::new('Apache::ASP', 'Apache2::RequestRec=SCALAR(0x81d9764)', '/data/wwwroot/asp/test.asp') called at /usr/share/pe      rl5/Apache/ASP.pm line 183\n\tApache::ASP::handler('Apache2::RequestRec=SCALAR(0x81d9764)') called at -e line 0\n\teval {...} called at -e line 0\n, re      ferer: http://127.0.0.1/asp/

Read nable-post. which patches /usr/share/perl5/Apache/ASP.pm as follows:

The lines 65-71:
   if($ENV{MOD_PERL}) {
   $ModPerl2 = ($mod_perl::VERSION &gt;= 1.99);
   if($ModPerl2) {
       eval "use Apache::ASP::ApacheCommon ();";
       die($@) if $@;
   }
   }
 
become
   if($ENV{MOD_PERL}) {
   $ModPerl2 = ($mod_perl::VERSION &gt;= 1.99);
   my $ver = $mod_perl::VERSION;
   if ($ver eq "") { $ver = $ENV{MOD_PERL_API_VERSION}; }
   $ModPerl2 = ($ver &gt;= 1.99);
   if($ModPerl2) {
       eval "use Apache::ASP::ApacheCommon ();";
       die($@) if $@;
   }
   }

6. If Step 5 still doesn’t work.

a. And this to /etc/apache2/conf.d/perl.conf:

PerlRequire /etc/apache2/startup.pl

b. startup.pl

#!/usr/bin/perl
use Apache2::compat;
1;

7. To test. Paste the ff in test.asp under your webroot.

  <!-- sample here -->
 
  For loop incrementing font size:
 
  &lt;% for(1..5) { %&gt;
	<!-- iterated html text -->
	<font size="&lt;%=$_%&gt;"> Size = &lt;%=$_%&gt; </font> 
  &lt;% } %&gt;
 
  <!-- end sample here -->
Categories: debian Tags: ,

Too many authentication failures for user

November 11th, 2007 rupert Comments off

Found this finally.. http://netthink.com/archives/191. On a quick note, edit ssh_config not sshd_config.

You could also try debugging ssh while connecting, through “-v” switch. For example:


ssh -v rupert@192.168.1.12

Categories: linux Tags: ,