<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom"><title>Recent changes to wiki</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/" rel="alternate"/><link href="https://sourceforge.net/p/aws-data-tools/wiki/feed.atom" rel="self"/><id>https://sourceforge.net/p/aws-data-tools/wiki/</id><updated>2016-04-07T12:47:15.411000Z</updated><subtitle>Recent changes to wiki</subtitle><entry><title>Oracle_To_Redshift_Data_Loader modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_Redshift_Data_Loader/" rel="alternate"/><published>2016-04-07T12:47:15.411000Z</published><updated>2016-04-07T12:47:15.411000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net57241010b97892857d474504d7e661bf0203719d</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v1
+++ v2
@@ -11,7 +11,7 @@
  - It's executable (Oracle_To_Redshift_Loader.exe)  - no need for Python install.
  - It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.
  - AWS Access Keys are not passed as arguments. 
- - Written using Python/boto/PyInstaller.
+ - Written using Python/boto/psycopg2/PyInstaller.

 ##Version
@@ -36,6 +36,7 @@
 - Data is loaded to Redshift using  COPY command
 - It's a Python/boto script
    * Boto S3 docs: http://boto.cloudhackers.com/en/latest/ref/s3.html
+   * psycopg2 docs: http://initd.org/psycopg/docs/ 
 - Executable is created using [pyInstaller] (http://www.pyinstaller.org/)

 ##Audience
&lt;/pre&gt;
&lt;/div&gt;</summary></entry><entry><title>Oracle_To_Redshift_Data_Loader modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_Redshift_Data_Loader/" rel="alternate"/><published>2016-04-07T12:46:43.121000Z</published><updated>2016-04-07T12:46:43.121000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net5d7eb01f13f7a02994bf5c6321bf54db4821be94</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;h1 id="oracle-to-redshift-data-loader"&gt;Oracle-to-Redshift-Data-Loader&lt;/h1&gt;
&lt;p&gt;Let's you stream your Oracle table/query data to Amazon-Redshift from Windows CLI (command line).&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - Streams Oracle table data to Amazon-Redshift.&lt;br/&gt;
 - No need to create CSV extracts before load to Redshift.&lt;br/&gt;
 - Data stream is compressed while load to Redshift.&lt;br/&gt;
 - No need for Amazon AWS CLI.&lt;br/&gt;
 - Works from your OS Windows desktop (command line).&lt;br/&gt;
 - It's executable (Oracle_To_Redshift_Loader.exe)  - no need for Python install.&lt;br/&gt;
 - It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.&lt;br/&gt;
 - AWS Access Keys are not passed as arguments. &lt;br/&gt;
 - Written using Python/boto/PyInstaller.&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;64bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[1.2 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Stream/pipe/load Oracle table data to Amazon-Redshift.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Tool connects to source Oracle DB and opens data pipe for reading.&lt;/li&gt;
&lt;li&gt;Data is pumped to S3 using multipart upload.&lt;/li&gt;
&lt;li&gt;Optional upload to Reduced Redundancy storage (not RR by default).&lt;/li&gt;
&lt;li&gt;Optional "make it public" after upload (private by default)&lt;/li&gt;
&lt;li&gt;If doesn't, bucket is created&lt;/li&gt;
&lt;li&gt;You can control the region where new bucket is created&lt;/li&gt;
&lt;li&gt;Streamed data can be tee'd (dumped on disk) during load.&lt;/li&gt;
&lt;li&gt;If not set, S3 Key defaulted to query file name.&lt;/li&gt;
&lt;li&gt;Data is loaded to Redshift using  COPY command&lt;/li&gt;
&lt;li&gt;It's a Python/boto script&lt;ul&gt;
&lt;li&gt;Boto S3 docs: &lt;a href="http://boto.cloudhackers.com/en/latest/ref/s3.html" rel="nofollow"&gt;http://boto.cloudhackers.com/en/latest/ref/s3.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Executable is created using &lt;span&gt;[pyInstaller]&lt;/span&gt; (http://www.pyinstaller.org/)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps, &lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32\PROJECTS\Ora2S3&amp;gt;dist\oracle_to_Redshift_loader.exe
#############################################################################
#Oracle to Redshift Data Loader (v1.2, beta, 04/05/2016 15:11:53) [64bit]
#Copyright (c): 2016 Alex Buzunov, All rights reserved.
#Agreement: Use this tool at your own risk. Author is not liable for any damages
#           or losses related to the use of this software.
################################################################################
Usage:
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set ORACLE_LOGIN=tiger/scott@orcl
  set ORACLE_CLIENT_HOME=C:\app\oracle12\product\12.1.0\dbhome_1

  oracle_to_s3_loader.exe [&amp;lt;ora_query_file&amp;gt;] [&amp;lt;ora_col_delim&amp;gt;] [&amp;lt;ora_add_header&amp;gt;]
                            [&amp;lt;s3_bucket_name&amp;gt;] [&amp;lt;s3_key_name&amp;gt;] [&amp;lt;s3_use_rr&amp;gt;] [&amp;lt;s3_public&amp;gt;]

        --ora_query_file -- SQL query to execure in source Oracle db.
        --ora_col_delim  -- CSV column delimiter (|).
        --ora_add_header -- Add header line to CSV file (False).
        --ora_lame_duck  -- Limit rows for trial load (1000).
        --create_data_dump -- Use it if you want to persist streamed data on your filesystem.

        --s3_bucket_name -- S3 bucket name (always set it).
        --s3_location    -- New bucket location name (us-west-2)
                                Set it if you are creating new bucket
        --s3_key_name    -- CSV file name (to store query results on S3).
                if &amp;lt;s3_key_name&amp;gt; is not specified, the oracle query filename (ora_query_file) will be used.
        --s3_use_rr -- Use reduced redundancy storage (False).
        --s3_write_chunk_size -- Chunk size for multipart upoad to S3 (10&amp;lt;&amp;lt;21, ~20MB).
        --s3_public -- Make loaded file public (False).
        --redshift_table -- Target Redshift table.

        Oracle data uploaded to S3 is always compressed (gzip).
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="example"&gt;Example&lt;/h1&gt;
&lt;h3 id="environment-variables"&gt;Environment variables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Set the following environment variables (for all tests):&lt;br/&gt;
set_env.bat:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;

set ORACLE_LOGIN=tiger/scott@orcl
set ORACLE_CLIENT_HOME=C:\\app\\oracle12\\product\\12.1.0\\dbhome_1
&lt;/pre&gt;&lt;/div&gt;


&lt;h3 id="test-load-with-data-dump"&gt;Test load with data dump.&lt;/h3&gt;
&lt;p&gt;In this example complete table &lt;code&gt;test2&lt;/code&gt; get's uploaded to Aamzon-S3 as compressed CSV file.&lt;/p&gt;
&lt;p&gt;Contents of the file &lt;em&gt;table_query.sql&lt;/em&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;SELECT * FROM test2;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Also temporary dump file is created for analysis (by default there are no files created)&lt;br/&gt;
Use &lt;code&gt;-s, --create_data_dump&lt;/code&gt; to dump streamed data.&lt;/p&gt;
&lt;p&gt;If target bucket does not exists it will be created in user controlled region.&lt;br/&gt;
Use argument &lt;code&gt;-t, --s3_location&lt;/code&gt; to set target region name&lt;/p&gt;
&lt;p&gt;Contents of the file &lt;em&gt;test.bat&lt;/em&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;dist\oracle_to_s3_loader.exe ^
    -q table_query.sql ^
    -d "|" ^
    -e ^
    -b test_bucket ^
    -k oracle_table_export ^
    -r ^
    -p ^
    -s
    -t target_table
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Executing &lt;code&gt;test.bat&lt;/code&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32\PROJECTS\Ora2Redshift&amp;gt;dist\oracle_to_redshift_loader.exe   -q table_query.sql      -d "|"  -e      -b test_bucket       -k oracle_table_export  -r      -p      -s
Uploading results of "table_query.sql" to existing bucket "test_bucket"
Dumping data to: c:\Python35-32\PROJECTS\Ora2S3\data_dump\table_query\test_bucket\oracle_table_export.20160405_235310.gz
1 chunk 10.0 GB [8.95 sec]
2 chunk 5.94 GB [5.37 sec]
Uncompressed data size: 15.94 GB
Compressed data size: 63.39 MB
Load complete (17.58 sec).
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="test-query"&gt;Test query&lt;/h4&gt;
&lt;h3 id="download"&gt;Download&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;git clone https://github.com/alexbuz/Oracle_to_Redshift_Data_Loader&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/Oracle_To_Redshift_Data_Loader/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;oracle_to_redshift_loader 1.2&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1 id="faq"&gt;FAQ&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h4 id="can-it-load-oracle-data-to-amazon-s3-file"&gt;Can it load Oracle data to Amazon S3 file?&lt;/h4&gt;
&lt;p&gt;Yes, it is the main purpose of this tool.&lt;/p&gt;
&lt;h4 id="can-developers-integrate-oracle_to_s3_data_uploader-into-their-etl-pipelines"&gt;Can developers integrate &lt;code&gt;Oracle_To_S3_Data_Uploader&lt;/code&gt; into their ETL pipelines?&lt;/h4&gt;
&lt;p&gt;Yes. Assuming they are doing it on OS Windows.&lt;/p&gt;
&lt;h4 id="how-fast-is-data-load-using-oracle_to_redshift_data_loader"&gt;How fast is data load using &lt;code&gt;Oracle_To_Redshift_Data_Loader&lt;/code&gt;?&lt;/h4&gt;
&lt;p&gt;As fast as any implementation of multi-part load using Python and boto.&lt;/p&gt;
&lt;h4 id="how-to-inscease-load-speed"&gt;How to inscease load speed?&lt;/h4&gt;
&lt;p&gt;Input data stream is getting compressed before upload to S3. So not much could be done here.&lt;br/&gt;
You may want to run it closer to source or target for better performance.&lt;/p&gt;
&lt;h4 id="what-are-the-other-ways-to-move-large-amounts-of-data-from-oracle-to-s3"&gt;What are the other ways to move large amounts of data from Oracle to S3?&lt;/h4&gt;
&lt;p&gt;You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.&lt;/p&gt;
&lt;h4 id="does-it-create-temporary-data-file-to-facilitate-data-load-to-s3"&gt;Does it create temporary data file to facilitate data load to S3?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="can-i-log-transfered-data-for-analysis"&gt;Can I log transfered data for analysis?&lt;/h4&gt;
&lt;p&gt;Yes, Use &lt;code&gt;-s, --create_data_dump&lt;/code&gt; to dump streamed data.&lt;/p&gt;
&lt;h4 id="explain-first-step-of-data-transfer"&gt;Explain first step of data transfer?&lt;/h4&gt;
&lt;p&gt;The query file you provided is used to select data form target Oracle server.&lt;br/&gt;
Stream is compressed before load to S3.&lt;/p&gt;
&lt;h4 id="explain-second-step-of-data-transfer"&gt;Explain second step of data transfer?&lt;/h4&gt;
&lt;p&gt;Compressed data is getting uploaded to S3 using multipart upload protocol.&lt;/p&gt;
&lt;h4 id="explain-third-step-of-data-load-how-data-is-loaded-to-amazon-redshift"&gt;Explain third step of data load. How data is loaded to Amazon Redshift?&lt;/h4&gt;
&lt;p&gt;You Redshift cluster has to be open to the world (accessible via port 5439 from internet).&lt;br/&gt;
It uses PostgreSQL COPY command to load file located on S3 into Redshift table.&lt;/p&gt;
&lt;h4 id="what-technology-was-used-to-create-this-tool"&gt;What technology was used to create this tool&lt;/h4&gt;
&lt;p&gt;I used SQL&lt;em&gt;Plus, Python, Boto to write it.&lt;br/&gt;
Boto is used to upload file to S3. &lt;br/&gt;
SQL&lt;/em&gt;Plus is used to spool data to compressor pipe.&lt;br/&gt;
psycopg2 is used to establish ODBC connection with Redshift clusted and execute &lt;code&gt;COPY&lt;/code&gt; command.&lt;/p&gt;
&lt;h4 id="does-it-delete-file-from-s3-after-upload"&gt;Does it delete file from S3 after upload?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="does-it-create-target-redshift-table"&gt;Does it create target Redshift table?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="where-are-the-sources"&gt;Where are the sources?&lt;/h4&gt;
&lt;p&gt;Please, contact me for sources.&lt;/p&gt;
&lt;h4 id="can-you-modify-functionality-and-add-features"&gt;Can you modify functionality and add features?&lt;/h4&gt;
&lt;p&gt;Yes, please, ask me for new features.&lt;/p&gt;
&lt;h4 id="what-other-aws-tools-youve-created"&gt;What other AWS tools you've created?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/Oracle_To_S3_Data_Uploader/"&gt;[Oracle_To_S3_Data_Uploader]&lt;/a&gt; (https://github.com/alexbuz/Oracle_To_S3_Data_Uploader) - Stream Oracle data to Amazon- S3.&lt;/li&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/"&gt;[CSV_Loader_For_Redshift]&lt;/a&gt; (https://github.com/alexbuz/CSV_Loader_For_Redshift/blob/master/README.md) - Append CSV data to Amazon-Redshift from Windows.&lt;/li&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/S3_Sanity_Check/"&gt;[S3_Sanity_Check]&lt;/a&gt; (https://github.com/alexbuz/S3_Sanity_Check/blob/master/README.md) - let's you &lt;code&gt;ping&lt;/code&gt; Amazon-S3 bucket to see if it's publicly readable.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/blob/master/README.md" rel="nofollow"&gt;EC2_Metrics_Plotter&lt;/a&gt; - plots any CloudWatch EC2 instance  metric stats.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/blob/master/README.md" rel="nofollow"&gt;S3_File_Uploader&lt;/a&gt; - uploads file from Windows to S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="do-you-have-any-aws-certifications"&gt;Do you have any AWS Certifications?&lt;/h4&gt;
&lt;p&gt;Yes, &lt;a class="" href="https://raw.githubusercontent.com/alexbuz/FAQs/master/images/AWS_Ceritied_Developer_Associate.png" rel="nofollow"&gt;AWS Certified Developer (Associate)&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="can-you-create-similarcustom-data-tool-for-our-business"&gt;Can you create similar/custom data tool for our business?&lt;/h4&gt;
&lt;p&gt;Yes, you can PM me here or email at &lt;code&gt;alex_buz@yahoo.com&lt;/code&gt;.&lt;br/&gt;
I'll get back to you within hours.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/FAQs/blob/master/README.md" rel="nofollow"&gt;Employment FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</summary></entry><entry><title>Oracle_To_S3_Data_Uploader modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_S3_Data_Uploader/" rel="alternate"/><published>2016-04-07T12:31:04.650000Z</published><updated>2016-04-07T12:31:04.650000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net0bce35fd4f138dbdaed4ab47b9212e6fd0a24e52</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;h1 id="oracle-to-s3-data-uploader"&gt;Oracle-to-S3 data uploader.&lt;/h1&gt;
&lt;p&gt;Let's you stream your Oracle table/query data to Amazon-S3 from Windows CLI (command line).&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - Streams Oracle table data to Amazon-S3.&lt;br/&gt;
 - No need to create CSV extracts before upload to S3.&lt;br/&gt;
 - Data stream is compressed while upload to S3.&lt;br/&gt;
 - No need for Amazon AWS CLI.&lt;br/&gt;
 - Works from your OS Windows desktop (command line).&lt;br/&gt;
 - It's executable (Oracle_To_S3_Uploader.exe)  - no need for Python install.&lt;br/&gt;
 - It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.&lt;br/&gt;
 - AWS Access Keys are not passed as arguments. &lt;br/&gt;
 - Written using Python/boto/PyInstaller.&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;64bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[1.2 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Stream (upload) Oracle table data to Amazon-S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Tool connects to source Oracle DB and opens data pipe for reading.&lt;/li&gt;
&lt;li&gt;Data is pumped to S3 using multipart upload.&lt;/li&gt;
&lt;li&gt;Optional upload to Reduced Redundancy storage (not RR by default).&lt;/li&gt;
&lt;li&gt;Optional "make it public" after upload (private by default)&lt;/li&gt;
&lt;li&gt;If doesn't, bucket is created&lt;/li&gt;
&lt;li&gt;You can control the region where new bucket is created&lt;/li&gt;
&lt;li&gt;Streamed data can be tee'd (dumped on disk) during upload.&lt;/li&gt;
&lt;li&gt;If not set, S3 Key defaulted to query file name.&lt;/li&gt;
&lt;li&gt;It's a Python/boto script&lt;ul&gt;
&lt;li&gt;Boto S3 docs: &lt;a href="http://boto.cloudhackers.com/en/latest/ref/s3.html" rel="nofollow"&gt;http://boto.cloudhackers.com/en/latest/ref/s3.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Executable is created using &lt;span&gt;[pyInstaller]&lt;/span&gt; (http://www.pyinstaller.org/)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps, &lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32\PROJECTS\Ora2S3&amp;gt;dist\oracle_to_s3_uploader.exe
#############################################################################
#Oracle to S3 Data Uploader (v1.2, beta, 04/05/2016 15:11:53) [64bit]
#Copyright (c): 2016 Alex Buzunov, All rights reserved.
#Agreement: Use this tool at your own risk. Author is not liable for any damages
#           or losses related to the use of this software.
################################################################################
Usage:
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set ORACLE_LOGIN=tiger/scott@orcl
  set ORACLE_CLIENT_HOME=C:\app\oracle12\product\12.1.0\dbhome_1

  oracle_to_s3_uploader.exe [&amp;lt;ora_query_file&amp;gt;] [&amp;lt;ora_col_delim&amp;gt;] [&amp;lt;ora_add_header&amp;gt;]
                            [&amp;lt;s3_bucket_name&amp;gt;] [&amp;lt;s3_key_name&amp;gt;] [&amp;lt;s3_use_rr&amp;gt;] [&amp;lt;s3_public&amp;gt;]

        --ora_query_file -- SQL query to execure in source Oracle db.
        --ora_col_delim  -- CSV column delimiter (|).
        --ora_add_header -- Add header line to CSV file (False).
        --ora_lame_duck  -- Limit rows for trial upload (1000).
        --create_data_dump -- Use it if you want to persist streamed data on your filesystem.

        --s3_bucket_name -- S3 bucket name (always set it).
        --s3_location    -- New bucket location name (us-west-2)
                                Set it if you are creating new bucket
        --s3_key_name    -- CSV file name (to store query results on S3).
                if &amp;lt;s3_key_name&amp;gt; is not specified, the oracle query filename (ora_query_file) will be used.
        --s3_use_rr -- Use reduced redundancy storage (False).
        --s3_write_chunk_size -- Chunk size for multipart upoad to S3 (10&amp;lt;&amp;lt;21, ~20MB).
        --s3_public -- Make uploaded file public (False).

        Oracle data uploaded to S3 is always compressed (gzip).
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="example"&gt;Example&lt;/h1&gt;
&lt;h3 id="environment-variables"&gt;Environment variables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Set the following environment variables (for all tests):&lt;br/&gt;
set_env.bat:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;

set ORACLE_LOGIN=tiger/scott@orcl
set ORACLE_CLIENT_HOME=C:\\app\\oracle12\\product\\12.1.0\\dbhome_1
&lt;/pre&gt;&lt;/div&gt;


&lt;h3 id="test-upload-with-data-dump"&gt;Test upload with data dump.&lt;/h3&gt;
&lt;p&gt;In this example complete table &lt;code&gt;test2&lt;/code&gt; get's uploaded to Aamzon-S3 as compressed CSV file.&lt;/p&gt;
&lt;p&gt;Contents of the file &lt;em&gt;table_query.sql&lt;/em&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;SELECT * FROM test2;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Also temporary dump file is created for analysis (by default there are no files created)&lt;br/&gt;
Use &lt;code&gt;-s, --create_data_dump&lt;/code&gt; to dump streamed data.&lt;/p&gt;
&lt;p&gt;If target bucket does not exists it will be created in user controlled region.&lt;br/&gt;
Use argument &lt;code&gt;-t, --s3_location&lt;/code&gt; to set target region name&lt;/p&gt;
&lt;p&gt;Contents of the file &lt;em&gt;test.bat&lt;/em&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;dist\oracle_to_s3_uploader.exe ^
    -q table_query.sql ^
    -d "|" ^
    -e ^
    -b test_bucket ^
    -k oracle_table_export ^
    -r ^
    -p ^
    -s
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Executing &lt;code&gt;test.bat&lt;/code&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32\PROJECTS\Ora2S3&amp;gt;dist\oracle_to_s3_uploader.exe   -q table_query.sql      -d "|"  -e      -b test_bucket       -k oracle_table_export  -r      -p      -s
Uploading results of "table_query.sql" to existing bucket "test_bucket"
Dumping data to: c:\Python35-32\PROJECTS\Ora2S3\data_dump\table_query\test_bucket\oracle_table_export.20160405_235310.gz
1 chunk 10.0 GB [8.95 sec]
2 chunk 5.94 GB [5.37 sec]
Uncompressed data size: 15.94 GB
Compressed data size: 63.39 MB
Upload complete (17.58 sec).
Your PUBLIC upload is at: https://s3-us-west-2.amazonaws.com/test_bucket/oracle_table_export.gz
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="Test results" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/Oracle_To_S3_Data_Uploader/master/dist-64bit/ora_to_s3_upload.png" title="Test Results"/&gt;&lt;/p&gt;
&lt;h3 id="download"&gt;Download&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;git clone https://github.com/alexbuz/Oracle_To_S3_Data_Uploader&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/Oracle_To_S3_Data_Uploader/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;oracle_to_s3_uploader 1.2&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1 id="faq"&gt;FAQ&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h4 id="can-it-load-oracle-data-to-amazon-s3-file"&gt;Can it load Oracle data to Amazon S3 file?&lt;/h4&gt;
&lt;p&gt;Yes, it is the main purpose of this tool.&lt;/p&gt;
&lt;h4 id="can-developers-integrate-oracle_to_s3_data_uploader-into-their-etl-pipelines"&gt;Can developers integrate &lt;code&gt;Oracle_To_S3_Data_Uploader&lt;/code&gt; into their ETL pipelines?&lt;/h4&gt;
&lt;p&gt;Yes. Assuming they are doing it on OS Windows.&lt;/p&gt;
&lt;h4 id="how-fast-is-data-upload-using-csv-loader-for-redshift"&gt;How fast is data upload using &lt;code&gt;CSV Loader for Redshift&lt;/code&gt;?&lt;/h4&gt;
&lt;p&gt;As fast as any implementation of multi-part load using Python and boto.&lt;/p&gt;
&lt;h4 id="how-to-inscease-upload-speed"&gt;How to inscease upload speed?&lt;/h4&gt;
&lt;p&gt;Input data stream is getting compressed before upload to S3. So not much could be done here.&lt;br/&gt;
You may want to run it closer to source or target for better performance.&lt;/p&gt;
&lt;h4 id="what-are-the-other-ways-to-move-large-amounts-of-data-from-oracle-to-s3"&gt;What are the other ways to move large amounts of data from Oracle to S3?&lt;/h4&gt;
&lt;p&gt;You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.&lt;/p&gt;
&lt;h4 id="does-it-create-temporary-data-file-to-facilitate-data-load-to-s3"&gt;Does it create temporary data file to facilitate data load to S3?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="can-i-log-transfered-data-for-analysis"&gt;Can I log transfered data for analysis?&lt;/h4&gt;
&lt;p&gt;Yes, Use &lt;code&gt;-s, --create_data_dump&lt;/code&gt; to dump streamed data.&lt;/p&gt;
&lt;h4 id="explain-first-step-of-data-transfer"&gt;Explain first step of data transfer?&lt;/h4&gt;
&lt;p&gt;The query file you provided is used to select data form target Oracle server.&lt;br/&gt;
Stream is compressed before load to S3.&lt;/p&gt;
&lt;h4 id="explain-second-step-of-data-transfer"&gt;Explain second step of data transfer?&lt;/h4&gt;
&lt;p&gt;Compressed data is getting uploaded to S3 using multipart upload protocol.&lt;/p&gt;
&lt;h4 id="what-technology-was-used-to-create-this-tool"&gt;What technology was used to create this tool&lt;/h4&gt;
&lt;p&gt;I used SQL&lt;em&gt;Plus, Python, Boto to write it.&lt;br/&gt;
Boto is used to upload file to S3. &lt;br/&gt;
SQL&lt;/em&gt;Plus is used to spool data to compressor pipe.&lt;/p&gt;
&lt;h4 id="where-are-the-sources"&gt;Where are the sources?&lt;/h4&gt;
&lt;p&gt;Please, contact me for sources.&lt;/p&gt;
&lt;h4 id="can-you-modify-functionality-and-add-features"&gt;Can you modify functionality and add features?&lt;/h4&gt;
&lt;p&gt;Yes, please, ask me for new features.&lt;/p&gt;
&lt;h4 id="what-other-aws-tools-youve-created"&gt;What other AWS tools you've created?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/"&gt;[CSV_Loader_For_Redshift]&lt;/a&gt; (https://github.com/alexbuz/CSV_Loader_For_Redshift/blob/master/README.md) - Append CSV data to Amazon-Redshift from Windows.&lt;/li&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/S3_Sanity_Check/"&gt;[S3_Sanity_Check]&lt;/a&gt; (https://github.com/alexbuz/S3_Sanity_Check/blob/master/README.md) - let's you &lt;code&gt;ping&lt;/code&gt; Amazon-S3 bucket to see if it's publicly readable.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/blob/master/README.md" rel="nofollow"&gt;EC2_Metrics_Plotter&lt;/a&gt; - plots any CloudWatch EC2 instance  metric stats.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/blob/master/README.md" rel="nofollow"&gt;S3_File_Uploader&lt;/a&gt; - uploads file from Windows to S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="do-you-have-any-aws-certifications"&gt;Do you have any AWS Certifications?&lt;/h4&gt;
&lt;p&gt;Yes, &lt;a class="" href="https://raw.githubusercontent.com/alexbuz/FAQs/master/images/AWS_Ceritied_Developer_Associate.png" rel="nofollow"&gt;AWS Certified Developer (Associate)&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="can-you-create-similarcustom-data-tool-for-our-business"&gt;Can you create similar/custom data tool for our business?&lt;/h4&gt;
&lt;p&gt;Yes, you can PM me here or email at &lt;code&gt;alex_buz@yahoo.com&lt;/code&gt;.&lt;br/&gt;
I'll get back to you within hours.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/FAQs/blob/master/README.md" rel="nofollow"&gt;Employment FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</summary></entry><entry><title>EC2_Metrics_Plotter modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/EC2_Metrics_Plotter/" rel="alternate"/><published>2016-04-07T12:29:06.268000Z</published><updated>2016-04-07T12:29:06.268000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net76297549432261ad236fbc0ededfbfd6521d04aa</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;h1 id="cloudwatch-awsec2-instance-metrics-plotter"&gt;CloudWatch AWS/EC2 instance metrics plotter.&lt;/h1&gt;
&lt;p&gt;Purpose:&lt;br/&gt;
 - Generate and plot statistics for CloudWatch EC2 instance.&lt;br/&gt;
 - All Metrics and statistics are supported&lt;/p&gt;
&lt;p&gt;Included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Statistics&lt;/em&gt;:&lt;br/&gt;
    Sum,Maximum,Minimum,SampleCount,Average&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Metrics&lt;/em&gt;:&lt;br/&gt;
    CPUUtilization,NetworkIn,NetworkOut,NetworkPacketsIn,&lt;br/&gt;
    NetworkPacketsOut,DiskWriteBytes,DiskReadBytes,DiskWriteOps,&lt;br/&gt;
    DiskReadOps,CPUCreditBalance,CPUCreditUsage,StatusCheckFailed,&lt;br/&gt;
    StatusCheckFailed_Instance,StatusCheckFailed_System&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Wrote using Python/boto3.&lt;br/&gt;
Compiled using PyInstaller&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;32bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[0.1.0 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Generate plots for AWS-ECS metrics and statistics.&lt;/li&gt;
&lt;li&gt;Helps you generate plots on demand and review them using generated html report.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;ec2metrics.exe connects to EC2 and reads datapoints for given CloudWatch EC2 instance/metric/statistic combo.&lt;/li&gt;
&lt;li&gt;Using matplotlib plot is created and saved on the filesystem.&lt;/li&gt;
&lt;li&gt;Html report is generated allowing preview saved metric plots.&lt;/li&gt;
&lt;li&gt;It will not work for group CloudWatch EC2 instances metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps&lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;C:\Python35-32&amp;gt;dist\ec2metrics\ec2metrics.exe
## Plots EC2 CPUUtilization metric for given instance id.
##
## Generates matplotlib plots for given instance/statistic/metric.
##
Usage:
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set AWS_DEFAULT_REGION=&amp;lt;your region &amp;gt; (for example:us-west-2 )
  ec2metrics.exe [&amp;lt;instance&amp;gt;] [&amp;lt;period_min&amp;gt;] [&amp;lt;from_min&amp;gt;] [&amp;lt;to_min&amp;gt;]
        [&amp;lt;statistic&amp;gt;] [&amp;lt;metric_name&amp;gt;] [&amp;lt;namespace&amp;gt;]
        [&amp;lt;show_plot&amp;gt; or &amp;lt;show_report&amp;gt;]
        [&amp;lt;plot_dir&amp;gt;] [&amp;lt;plot_dir&amp;gt;]

        [-b] --instance         -- EC2 instance name (i-********).
        [-p] --period_min       -- Aggregation interval (5 min).
        [-f] --from_min         -- Start from, min (60).
        [-t] --to_min           -- End at, min (0 - present).
        [-s] --statistic        -- Statistic type (Average).
           Could be one of: Sum,Maximum,Minimum,SampleCount,Average
        [-m] --metric_name  -- Metric name (CPUUtilization)
           Could be one of:
                CPUUtilization,NetworkIn,NetworkOut,NetworkPacketsIn,
                NetworkPacketsOut,DiskWriteBytes,DiskReadBytes,DiskWriteOps,
                DiskReadOps,CPUCreditBalance,CPUCreditUsage,StatusCheckFailed,
                StatusCheckFailed_Instance,StatusCheckFailed_System
        [-g] --namespace        -- CloudWatch namespace,
        container for metric (AWS/EC2).

        [-r] --show_plot        -- Open plotter window (False).
        [-n] --show_report  -- Open browser with html report (True).

        [-d] --plot_dir         -- Target plot dir (plots).
        [-e] --plot_dir         -- Timestamp for to append to plot_dir
        (current date).

        Index.html is generated in &amp;lt;plot_dir&amp;gt;\&amp;lt;timestamp&amp;gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;h2 id="environment-variables"&gt;Environment variables&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set AWS_DEFAULT_REGION=&amp;lt;your region &amp;gt; (for example:us-west-2 )
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="examples"&gt;Examples&lt;/h1&gt;
&lt;h3 id="plot-averageminimum-for-networkin-cloudwatch-ec2-metric"&gt;Plot "Average,Minimum" for "NetworkIn" CloudWatch EC2 metric.&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;ec2metrics.exe --instance i-fe9cea26 -f 1000  -p 10  -s Average,Minimum -m NetworkIn  -r

200(0100/0100): i-fe9cea26: NetworkIn: Sum
200(0100/0100): i-fe9cea26: NetworkIn: Maximum
200(0100/0100): i-fe9cea26: NetworkIn: Minimum
200(0100/0100): i-fe9cea26: NetworkIn: SampleCount
200(0100/0100): i-fe9cea26: NetworkIn: Average

Report is at: C:\Python35-32\plots\20160327_220118\index.html
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="result"&gt;Result:&lt;/h4&gt;
&lt;p&gt;&lt;img alt="NetworkIn/Average/10min" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/EC2_Metrics_Plotter/master/plots/EC2_NetworkIn/by_metric/NetworkIn/Average/10/NetworkIn.Average.10.i-fe9cea26.png"/&gt;&lt;/p&gt;
&lt;h4 id="html-report"&gt;Html report&lt;/h4&gt;
&lt;p&gt;Report is generated with preview for all plots created with this job.&lt;br/&gt;
&lt;img alt="ALL" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/EC2_Metrics_Plotter/master/plot_reports/networkin.png"/&gt;&lt;/p&gt;
&lt;h3 id="plot-summaximumminimumsamplecountaverage-stats-for-cpuutilization-cloudwatch-ec2-metric"&gt;Plot "Sum,Maximum,Minimum,SampleCount,Average" stats for "CPUUtilization" CloudWatch EC2 metric.&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;ec2metrics.exe --instance i-fe9cea26 -f 500  -p 1  -s Sum,Maximum,Minimum,SampleCount,Average -m CPUUtilization  -r
200(0084/0099): i-fe9cea26: CPUUtilization: Sum
200(0084/0099): i-fe9cea26: CPUUtilization: Maximum
200(0000/0099): i-fe9cea26: CPUUtilization: Minimum
200(0099/0099): i-fe9cea26: CPUUtilization: SampleCount
200(0084/0099): i-fe9cea26: CPUUtilization: Average

Report is at: c:\Python35-32\plots\20160328_113906\index.html
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="result_1"&gt;Result:&lt;/h4&gt;
&lt;p&gt;&lt;img alt="CPUCreditUsage/Average/30min" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/EC2_Metrics_Plotter/master/plots/CPUUtilization/by_instance/i-fe9cea26/1/CPUUtilization.Average.1.i-fe9cea26.png"/&gt;&lt;/p&gt;
&lt;h3 id="plot-summaximumminimumsamplecountaverage-stats-for-cpucreditusage-cloudwatch-ec2-metric"&gt;Plot "Sum,Maximum,Minimum,SampleCount,Average" stats for "CPUCreditUsage" CloudWatch EC2 metric.&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;ec2metrics.exe --instance i-fe9cea26 -f 6000  -p 30  -s Sum,Maximum,Minimum,SampleCount,Average -m CPUCreditUsage  -r -t 3000 -e CPUCreditUsage
200(0006/0027): i-fe9cea26: CPUCreditUsage: Sum
200(0006/0027): i-fe9cea26: CPUCreditUsage: Maximum
200(0001/0027): i-fe9cea26: CPUCreditUsage: Minimum
200(0027/0027): i-fe9cea26: CPUCreditUsage: SampleCount
200(0006/0027): i-fe9cea26: CPUCreditUsage: Average
Report is at: c:\Python35-32\plots\CPUCreditUsage\index.html
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="result_2"&gt;Result:&lt;/h4&gt;
&lt;p&gt;&lt;img alt="CPUCreditUsage/Average/30min" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/EC2_Metrics_Plotter/master/plots/CPUCreditUsage/by_metric/CPUCreditUsage/Average/30/CPUCreditUsage.Average.30.i-fe9cea26.png"/&gt;&lt;/p&gt;
&lt;h3 id="plot-all-stats-for-all-cloudwatch-ec2-metrics"&gt;Plot all stats for all CloudWatch EC2 metrics.&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32&amp;gt;dist\ec2metrics\ec2metrics.exe --from_min 3000 --instance 'i-fe9cea26,i-fe9cea26' --metric_name CPUUtilization,NetworkIn,NetworkOut,NetworkPacketsIn,NetworkPacketsOut,DiskWriteBytes,DiskReadBytes,DiskWriteOps,DiskReadOps,CPUCreditBalance,CPUCreditUsage,StatusCheckFailed,StatusCheckFailed_Instance,StatusCheckFailed_System --namespace AWS/EC2 --period_min 1 --plot_dir C:\Python35-32\plots --statistic Average,Minimum,Maximum,Sum  --to_min 2000 -r -e All_Metrics

200(0174/0200): i-fe9cea26: CPUUtilization: Average
200(0001/0200): i-fe9cea26: CPUUtilization: Minimum
200(0174/0200): i-fe9cea26: CPUUtilization: Maximum
200(0174/0200): i-fe9cea26: CPUUtilization: Sum
200(0200/0200): i-fe9cea26: NetworkIn: Average
...
200(0000/1000): i-fe9cea26: StatusCheckFailed_Instance: Sum
200(0000/1000): i-fe9cea26: StatusCheckFailed_System: Average
200(0000/1000): i-fe9cea26: StatusCheckFailed_System: Minimum
200(0000/1000): i-fe9cea26: StatusCheckFailed_System: Maximum
200(0000/1000): i-fe9cea26: StatusCheckFailed_System: Sum

Report is at: C:\Python35-32\plots\All_Metrics\index.html
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="result_3"&gt;Result:&lt;/h4&gt;
&lt;p&gt;One of the plots:&lt;br/&gt;
&lt;img alt="NetworkIn/Average/10min" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/EC2_Metrics_Plotter/master/plots/CPUCreditBalance/by_instance/i-fe9cea26/30/CPUCreditBalance.Sum.30.i-fe9cea26.png"/&gt;&lt;/p&gt;
&lt;h4 id="html-report_1"&gt;Html report&lt;/h4&gt;
&lt;p&gt;Report is generated with preview for all plots created with this job.&lt;br/&gt;
&lt;img alt="ALL" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/EC2_Metrics_Plotter/master/plot_reports/all.png"/&gt;&lt;/p&gt;
&lt;h2 id="download"&gt;Download&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;ec2metrics 0.1.0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</summary></entry><entry><title>S3_File_Uploader modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/S3_File_Uploader/" rel="alternate"/><published>2016-04-07T12:27:36.602000Z</published><updated>2016-04-07T12:27:36.602000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net22981494841d39604482970aed88fe282083a8f1</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;h1 id="s3-file-uploader-for-windows-cli"&gt;S3 File Uploader for Windows CLI.&lt;/h1&gt;
&lt;p&gt;Basic &lt;em&gt;file to Amazon-S3&lt;/em&gt; uploader.&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - No need for Amazon AWS CLI&lt;br/&gt;
 - Works from your OS Windows desktop (command line)&lt;br/&gt;
 - Logs upload % progress to CLI screen.&lt;br/&gt;
 - It's executable (s3_percent_upload.exe)  - no need for Python install&lt;br/&gt;
 - It's 32 bit - it will work on any vanilla Windows.&lt;br/&gt;
 - Access keys are fed from CLI environment (not command line args)&lt;br/&gt;
 - Written using Python/boto/PyInstaller&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;64bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[0.1.0 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Ad-hoc file upload to Amazon S3.&lt;/li&gt;
&lt;li&gt;Optional upload to Reduced Redundancy storage (not RR by default).&lt;/li&gt;
&lt;li&gt;Optional "make it public" after upload (private by default)&lt;/li&gt;
&lt;li&gt;Custom S3 Key (defaulted to transfer file name)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Business Analysts, AWS Developers, DevOps, &lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="amazon-s3-data-upload-price"&gt;Amazon S3 data upload price&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;It's free to upload file to Amazon S3.&lt;/li&gt;
&lt;li&gt;Storage will cost you.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;## Upload file to S3.
##
## Upload % progress outputs to the screen.
##
Usage:
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  s3_percent_upload.exe &amp;lt;file_to_transfer&amp;gt; &amp;lt;bucket_name&amp;gt; [&amp;lt;s3_key_name&amp;gt;] [&amp;lt;use_rr&amp;gt;] [&amp;lt;public&amp;gt;]
        if &amp;lt;s3_key_name&amp;gt; is not specified, the filename will be used.
        --use_rr -- Use reduced redundancy storage.
        --public -- Make uploaded files public.

        Boto S3 docs: http://boto.cloudhackers.com/en/latest/ref/s3.html
&lt;/pre&gt;&lt;/div&gt;


&lt;h2 id="environment-variables"&gt;Environment variables&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Set the following environment variables:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="example"&gt;Example&lt;/h1&gt;
&lt;h2 id="upload-file-to-amazon-s3-reduced-redundancy-storage-and-make-in-publicly-accessible"&gt;Upload file to Amazon-S3 Reduced Redundancy storage and make in Publicly accessible&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;S3_RR_Public_upload.bat&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;

cd c:\tmp\S3_Uploader
s3_percent_upload.exe c:\tmp\data.zip test123 --use_rr -public
&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;resutl.log (S3_RR_Public_upload.bat &amp;gt; resutl.log)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;Connecting to S3...
File size: 388.5KiB
Public = True
ReducedRedundancy = True
Uploaded 0 bytes of 397799 (0%)
Uploaded 24576 bytes of 397799 (6%)
Uploaded 49152 bytes of 397799 (12%)
Uploaded 73728 bytes of 397799 (18%)
Uploaded 98304 bytes of 397799 (24%)
Uploaded 122880 bytes of 397799 (30%)
Uploaded 147456 bytes of 397799 (37%)
Uploaded 172032 bytes of 397799 (43%)
Uploaded 196608 bytes of 397799 (49%)
Uploaded 221184 bytes of 397799 (55%)
Uploaded 245760 bytes of 397799 (61%)
Uploaded 270336 bytes of 397799 (67%)
Uploaded 294912 bytes of 397799 (74%)
Uploaded 319488 bytes of 397799 (80%)
Uploaded 344064 bytes of 397799 (86%)
Uploaded 368640 bytes of 397799 (92%)
Uploaded 393216 bytes of 397799 (98%)
Upload complete.
Your file is at: https://s3-website-us-west-2.amazonaws.com/test123/data.zip

Time elapsed: 2.54299998283 seconds
&lt;/pre&gt;&lt;/div&gt;


&lt;h2 id="download"&gt;Download&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;s3_percent_uploader 0.1.0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</summary></entry><entry><title>S3_Sanity_Check modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/S3_Sanity_Check/" rel="alternate"/><published>2016-04-07T12:26:09.666000Z</published><updated>2016-04-07T12:26:09.666000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net1dd008aa0f69f5e1083267cc8f5b5780767d3aa2</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;pre&gt;&lt;/pre&gt;
&lt;/div&gt;</summary></entry><entry><title>S3_Sanity_Check modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/S3_Sanity_Check/" rel="alternate"/><published>2016-04-07T12:25:34.359000Z</published><updated>2016-04-07T12:25:34.359000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net9974a4717abd9bb50193010b06b3d973d552dac6</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;h1 id="s3-bucket-sanity-check-for-windows-cli"&gt;S3 bucket sanity check for Windows CLI.&lt;/h1&gt;
&lt;p&gt;Simple sanity (public access) check for Amazon-S3 bucket.&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - Checks if given Amazon-S3 bucket is publicly accessible or not.&lt;br/&gt;
 - No need for Amazon AWS CLI&lt;br/&gt;
 - Works from your OS Windows desktop (command line)&lt;br/&gt;
 - It's executable (s3sanity.exe)  - no need for Python install&lt;br/&gt;
 - It's 32 bit - it will work on any vanilla Windows.&lt;br/&gt;
 - No AWS Access Keys needed. &lt;br/&gt;
 - Written using Python/boto/PyInstaller&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;32bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[0.1.0 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;s3sanity&lt;/code&gt; helps you confirm that given bucket is not publicly accessible.&lt;/li&gt;
&lt;li&gt;Helps you find readable bucket on Amazon-S3 (for fun).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;s3sanity.exe tries to read given bucket from Amazon-S3.&lt;/li&gt;
&lt;li&gt;Prints &lt;code&gt;success&lt;/code&gt; message if bucket is publicly accessible (readable).&lt;/li&gt;
&lt;li&gt;Prints &lt;code&gt;error&lt;/code&gt; message if bucket does not exists or not readable.&lt;/li&gt;
&lt;li&gt;It will not elaborate why read try faied.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps&lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;C:\Python35-32&amp;gt;dist\s3sanity.exe
## S3 sanity check.
##
## Outputs access status to the screen.
##
Usage:

  s3_sanity.exe -b &amp;lt;bucket_name&amp;gt;

    -b [--bucket] -- S3 bucket name.

"""
&lt;/pre&gt;&lt;/div&gt;


&lt;h2 id="environment-variables"&gt;Environment variables&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;No environment variables required because it's a public access test.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="examples"&gt;Examples&lt;/h1&gt;
&lt;h3 id="how-to-check-if-s3-bucket-is-publicly-readable-or-not"&gt;How to check if S3 bucket is publicly readable or not?&lt;/h3&gt;
&lt;h4 id="testing-if-bucket-test-is-readable-by-everyone"&gt;Testing if bucket &lt;code&gt;test&lt;/code&gt; is readable by everyone.&lt;/h4&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;C:\Python35-32&amp;gt;dist\s3sanity.exe -b test
#####################################
NO access OR does NOT exists ("test")
#####################################
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Bucket &lt;code&gt;test&lt;/code&gt; does not exist or unreadable by everyone.&lt;/p&gt;
&lt;h4 id="testing-if-bucket-test2-is-readable-by-everyone"&gt;Testing if bucket &lt;code&gt;test2&lt;/code&gt; is readable by everyone.&lt;/h4&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;C:\Python35-32&amp;gt;dist\s3sanity.exe -b test2

You HAVE access to "test2"
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Horay! Bucket &lt;code&gt;test2&lt;/code&gt; is readable by everyone!&lt;/p&gt;
&lt;h4 id="testing-if-bucket-elvis-is-readable-by-everyone"&gt;Testing if bucket "elvis" is readable by everyone.&lt;/h4&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;C:\Python35-32&amp;gt;dist\s3sanity.exe -b elvis
#####################################
NO access OR does NOT exists ("elvis")
#####################################
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Too bad. Bucket &lt;code&gt;elvis&lt;/code&gt; does not exist or unreadable by everyone.&lt;/p&gt;
&lt;h4 id="testing-if-bucket-refuse-is-readable-by-everyone"&gt;Testing if bucket "refuse" is readable by everyone.&lt;/h4&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;C:\Python35-32&amp;gt;dist\s3sanity.exe -b refuse

You HAVE access to "refuse"
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Horay! Bucket &lt;code&gt;refuse&lt;/code&gt; is readable by everyone!&lt;/p&gt;
&lt;h2 id="download"&gt;Download&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_Sanity_Check/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;s3sanity 0.1.0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="faq"&gt;FAQ&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h4 id="can-it-tell-if-s3-bucket-is-unreadable"&gt;Can it tell if S3 bucket is unreadable?&lt;/h4&gt;
&lt;p&gt;No, it cannot really tell if it's unreadable or simply does not exists.&lt;/p&gt;
&lt;h4 id="do-i-need-a-password-to-use-it"&gt;Do I need a password to use it?&lt;/h4&gt;
&lt;p&gt;No, no passwords or access keys.&lt;br/&gt;
We are testing public read access to a bucket.&lt;/p&gt;
&lt;h4 id="how-does-it-work"&gt;How does it work?&lt;/h4&gt;
&lt;p&gt;I use Python boto (AWS API for Python) module to interact with AWS.&lt;br/&gt;
Boto checks header of the bucket. If successful - it's readable.&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;s3.meta.client.head_bucket(Bucket=bucket.name)
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="what-are-the-other-ways-to-do-it"&gt;What are the other ways to do it?&lt;/h4&gt;
&lt;p&gt;You can use AWS CLI &lt;code&gt;aws s3api&lt;/code&gt; to do the same.&lt;/p&gt;
&lt;h4 id="can-i-check-multiple-buckets-for-readability"&gt;Can I check multiple buckets for readability?&lt;/h4&gt;
&lt;p&gt;No. To do you have to call &lt;code&gt;s3sanity.exe&lt;/code&gt; for each S3 bucket.&lt;/p&gt;
&lt;h4 id="does-check-contents-of-the-bucket-for-readability"&gt;Does check contents of the bucket for readability?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="does-it-create-any-manifest-files"&gt;Does it create any manifest files?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="can-i-use-linux"&gt;Can I use Linux.&lt;/h4&gt;
&lt;p&gt;No, only OS Windows for now.&lt;/p&gt;
&lt;h4 id="what-technology-was-used-to-create-this-tool"&gt;What technology was used to create this tool&lt;/h4&gt;
&lt;p&gt;I used Python and Boto (AWS API for Python) to write it.&lt;/p&gt;
&lt;h4 id="where-are-the-sources"&gt;Where are the sources?&lt;/h4&gt;
&lt;p&gt;Please, contact me for sources.&lt;/p&gt;
&lt;h4 id="can-you-modify-functionality-and-add-features"&gt;Can you modify functionality and add features?&lt;/h4&gt;
&lt;p&gt;Yes, please, ask me for new features.&lt;/p&gt;
&lt;h4 id="what-other-aws-tools-youve-created"&gt;What other AWS tools you've created?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/"&gt;[CSV_Loader_For_Redshift]&lt;/a&gt; (https://github.com/alexbuz/CSV_Loader_For_Redshift/blob/master/README.md) - Append CSV data to Amazon-Redshift from Windows.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/blob/master/README.md" rel="nofollow"&gt;EC2_Metrics_Plotter&lt;/a&gt; - plots any CloudWatch EC2 instance  metric stats.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/blob/master/README.md" rel="nofollow"&gt;S3_File_Uploader&lt;/a&gt; - uploads file from Windows to S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="do-you-have-any-aws-certifications"&gt;Do you have any AWS Certifications?&lt;/h4&gt;
&lt;p&gt;Yes, &lt;a class="" href="https://raw.githubusercontent.com/alexbuz/FAQs/master/images/AWS_Ceritied_Developer_Associate.png" rel="nofollow"&gt;AWS Certified Developer (Associate)&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="can-you-create-similarcustom-data-tool-for-our-business"&gt;Can you create similar/custom data tool for our business?&lt;/h4&gt;
&lt;p&gt;Yes, you can PM me here or email at &lt;code&gt;alex_buz@yahoo.com&lt;/code&gt;.&lt;br/&gt;
I'll get back to you within hours.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/FAQs/blob/master/README.md" rel="nofollow"&gt;Employment FAQ&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</summary></entry><entry><title>CSV_Loader_For_Redshift modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/" rel="alternate"/><published>2016-04-07T11:47:42.095000Z</published><updated>2016-04-07T11:47:42.095000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net723c4e7a91d564e10e81010ce5c60d5ef9aa9e6a</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;pre&gt;&lt;/pre&gt;
&lt;/div&gt;</summary></entry><entry><title>CSV_Loader_For_Redshift modified by Alex Buzunov</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/" rel="alternate"/><published>2016-04-07T11:46:34.900000Z</published><updated>2016-04-07T11:46:34.900000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.netb1e287cc83dfd5a9972938cc3442f109849c45b7</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;h1 id="csv-file-loader-for-amazon-redshift-db"&gt;CSV File Loader for Amazon Redshift DB.&lt;/h1&gt;
&lt;p&gt;Loads CSV file to Amazon-Redshift table from Windows command line.&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - Loads local (to your Windows desktop) CSV file to Amazon Redshift.&lt;br/&gt;
 - No need to preload your data to S3 prior to insert to Redshift.&lt;br/&gt;
 - No need for Amazon AWS CLI.&lt;br/&gt;
 - Works from your OS Windows desktop (command line).&lt;br/&gt;
 - It's executable (csv_loader_for_redshift.exe)  - no need for Python install.&lt;br/&gt;
 - It's 32 bit - it will work on any vanilla Windows.&lt;br/&gt;
 - AWS Access Keys are not passed as arguments. &lt;br/&gt;
 - Written using Python/boto/PyInstaller.&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;32bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[0.1.0 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Ad-hoc CSV file load to Amazon Redshift table.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;File is staged on S3 prior to load to Redshift&lt;/li&gt;
&lt;li&gt;Optional upload to Reduced Redundancy storage (not RR by default).&lt;/li&gt;
&lt;li&gt;Optional "make it public" after upload (private by default)&lt;/li&gt;
&lt;li&gt;S3 Key defaulted to transfer file name.&lt;/li&gt;
&lt;li&gt;Load is done using COPY command&lt;/li&gt;
&lt;li&gt;Target Redshift table has to exist&lt;/li&gt;
&lt;li&gt;It's a Python/boto/psycopg2 script&lt;ul&gt;
&lt;li&gt;Boto S3 docs: &lt;a href="http://boto.cloudhackers.com/en/latest/ref/s3.html" rel="nofollow"&gt;http://boto.cloudhackers.com/en/latest/ref/s3.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;psycopg2 docs: &lt;a href="http://initd.org/psycopg/docs/" rel="nofollow"&gt;http://initd.org/psycopg/docs/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Executable is created using &lt;span&gt;[pyInstaller]&lt;/span&gt; (http://www.pyinstaller.org/)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps, &lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;## Load CSV file to Amazon Redshift table.
##
## Load % progress outputs to the screen.
##
Usage:  
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  
  csv_loader_for_redshift.py &amp;lt;file_to_transfer&amp;gt; &amp;lt;bucket_name&amp;gt; [&amp;lt;use_rr&amp;gt;] [&amp;lt;public&amp;gt;]
                         [&amp;lt;delim&amp;gt;] [&amp;lt;quote&amp;gt;] [&amp;lt;to_table&amp;gt;] [&amp;lt;gzip_source_file&amp;gt;]

    --use_rr -- Use reduced redundancy storage (False).
    --public -- Make uploaded files public (False).
    --delim  -- CSV file delimiter (',').
    --quote  -- CSV quote ('"').
    --to_table  -- Target Amazon-Redshit table name.
    --gzip_source_file  -- gzip input CVS file before upload to Amazon-S3 (False).

    Input filename will be used for S3 key name.

    Boto S3 docs: http://boto.cloudhackers.com/en/latest/ref/s3.html
    psycopg2 docs: http://initd.org/psycopg/docs/

"""
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="example"&gt;Example&lt;/h1&gt;
&lt;h3 id="environment-variables"&gt;Environment variables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Set the following environment variables (for all tests:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;

set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  
&lt;/pre&gt;&lt;/div&gt;


&lt;h3 id="csv-file-upload-into-redshift-table-test2"&gt;CSV file upload into Redshift table &lt;code&gt;test2&lt;/code&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;examples\Load_CSV_To_Redshift_Table.bat&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  

cd c:\tmp\CSV_Loader
csv_loader_for_redshift.exe c:\tmp\data.csv test123 -r -d "," -t test2 -z
&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;resutl.log (Load_CSV_To_Redshift_Table.bat &amp;gt; resutl.log)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;S3        | data.csv.gz | 100%
Redshift  | test2       | DONE
Time elapsed: 5.7 seconds
&lt;/pre&gt;&lt;/div&gt;


&lt;h2 id="test-prerequisits"&gt;Test prerequisits.&lt;/h2&gt;
&lt;h4 id="target-redshift-table-ddl"&gt;Target Redshift table DDL&lt;/h4&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;CREATE TABLE test2 (id integer , num integer, data varchar,num2 integer, data2 varchar,num3 
integer, data3 varchar,num4 integer, data4 varchar);
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="test-data"&gt;Test data&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Test data is in file examples\data.csv&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="sources"&gt;Sources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Will add as soon as I clean em up and remove all the passwords and AWS keys :-)).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="download"&gt;Download&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;git clone https://github.com/alexbuz/CSV_Loader_For_Redshift&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/CSV_Loader_For_Redshift/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;csv_loader_for_redshift 0.1.0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1 id="faq"&gt;FAQ&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h4 id="can-it-load-csv-file-from-windows-desktop-to-amazon-redshift"&gt;Can it load CSV file from Windows desktop to Amazon Redshift.&lt;/h4&gt;
&lt;p&gt;Yes, it is the main purpose of this tool.&lt;/p&gt;
&lt;h4 id="can-developers-integrate-csv-loader-into-their-etl-pipelines"&gt;Can developers integrate CSV loader into their ETL pipelines?&lt;/h4&gt;
&lt;p&gt;Yes. Assuming they are doing it on OS Windows.&lt;/p&gt;
&lt;h4 id="how-fast-is-data-upload-using-csv-loader-for-redshift"&gt;How fast is data upload using &lt;code&gt;CSV Loader for Redshift&lt;/code&gt;?&lt;/h4&gt;
&lt;p&gt;As fast as any AWS API provided by Amazon.&lt;/p&gt;
&lt;h4 id="how-to-inscease-upload-speed"&gt;How to inscease upload speed?&lt;/h4&gt;
&lt;p&gt;Compress input file or provide &lt;code&gt;-z&lt;/code&gt; or &lt;code&gt;--gzip_source_file&lt;/code&gt; arg in command line and this tool will compress it for you before upload to S3.&lt;/p&gt;
&lt;h4 id="what-are-the-other-ways-to-upload-file-to-redshift"&gt;What are the other ways to upload file to Redshift?&lt;/h4&gt;
&lt;p&gt;You can use 'aws s3api' and psql COPY command to do pretty much the same.&lt;/p&gt;
&lt;h4 id="can-i-just-zip-it-using-windows-file-explorer"&gt;Can I just zip it using Windows File Explorer?&lt;/h4&gt;
&lt;p&gt;No, Redshift will not recognize *.zip file format.&lt;br/&gt;
You have to &lt;code&gt;gzip&lt;/code&gt; it. You can use 7-Zip to do that.&lt;/p&gt;
&lt;h4 id="does-it-delete-file-from-s3-after-upload"&gt;Does it delete file from S3 after upload?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="does-it-create-target-redshift-table"&gt;Does it create target Redshift table?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="is-there-an-option-to-compress-input-csv-file-before-upload"&gt;Is there an option to compress input CSV file before upload?&lt;/h4&gt;
&lt;p&gt;Yes. Use &lt;code&gt;-z&lt;/code&gt; or &lt;code&gt;--gzip_source_file&lt;/code&gt; argument so the tool does compression for you.&lt;/p&gt;
&lt;h4 id="explain-first-step-of-data-load"&gt;Explain first step of data load?&lt;/h4&gt;
&lt;p&gt;The CSV you provided is getting preloaded to Amazon-S3.&lt;br/&gt;
It doesn't have to be made public for load to Redshift. &lt;br/&gt;
It can be compressed or uncompressed.&lt;br/&gt;
Your input file is getting compressed (optional) and uploaded to S3 using credentials you set in shell.&lt;/p&gt;
&lt;h4 id="explain-second-step-of-data-load-how-data-is-loaded-to-amazon-redshift"&gt;Explain second step of data load. How data is loaded to Amazon Redshift?&lt;/h4&gt;
&lt;p&gt;You Redshift cluster has to be open to the world (accessible via port 5439 from internet).&lt;br/&gt;
It uses PostgreSQL COPY command to load file located on S3 into Redshift table.&lt;/p&gt;
&lt;h4 id="can-i-use-winzip-or-7-zip"&gt;Can I use WinZip or 7-zip&lt;/h4&gt;
&lt;p&gt;Yes, but you have to use 'gzip' compression type.&lt;/p&gt;
&lt;h4 id="what-technology-was-used-to-create-this-tool"&gt;What technology was used to create this tool&lt;/h4&gt;
&lt;p&gt;I used Python, Boto, and psycopg2 to write it.&lt;br/&gt;
Boto is used to upload file to S3. &lt;br/&gt;
psycopg2 is used to establish ODBC connection with Redshift clusted and execute &lt;code&gt;COPY&lt;/code&gt; command.&lt;/p&gt;
&lt;h4 id="where-are-the-sources"&gt;Where are the sources?&lt;/h4&gt;
&lt;p&gt;Please, contact me for sources.&lt;/p&gt;
&lt;h4 id="can-you-modify-functionality-and-add-features"&gt;Can you modify functionality and add features?&lt;/h4&gt;
&lt;p&gt;Yes, please, ask me for new features.&lt;/p&gt;
&lt;h4 id="what-other-aws-tools-youve-created"&gt;What other AWS tools you've created?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span&gt;[S3_Sanity_Check]&lt;/span&gt; (https://github.com/alexbuz/S3_Sanity_Check/blob/master/README.md) - let's you &lt;code&gt;ping&lt;/code&gt; Amazon-S3 bucket to see if it's publicly readable.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/blob/master/README.md" rel="nofollow"&gt;EC2_Metrics_Plotter&lt;/a&gt; - plots any CloudWatch EC2 instance  metric stats.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/blob/master/README.md" rel="nofollow"&gt;S3_File_Uploader&lt;/a&gt; - uploads file from Windows to S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="do-you-have-any-aws-certifications"&gt;Do you have any AWS Certifications?&lt;/h4&gt;
&lt;p&gt;Yes, &lt;a class="" href="https://raw.githubusercontent.com/alexbuz/FAQs/master/images/AWS_Ceritied_Developer_Associate.png" rel="nofollow"&gt;AWS Certified Developer (Associate)&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="can-you-create-similarcustom-data-tool-for-our-business"&gt;Can you create similar/custom data tool for our business?&lt;/h4&gt;
&lt;p&gt;Yes, you can PM me here or email at &lt;code&gt;alex_buz@yahoo.com&lt;/code&gt;.&lt;br/&gt;
I'll get back to you within hours.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/FAQs/blob/master/README.md" rel="nofollow"&gt;Employment FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</summary></entry><entry><title>Discussion for Home page</title><link href="https://sourceforge.net/p/aws-data-tools/wiki/Home/" rel="alternate"/><published>2016-04-07T11:44:47.520000Z</published><updated>2016-04-07T11:44:47.520000Z</updated><author><name>Alex Buzunov</name><uri>https://sourceforge.net/u/alexbuz/</uri></author><id>https://sourceforge.net9238976ed3c8281e57167624072dec65d3f76b69</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;&lt;a class="" href="https://github.com/alexbuz/CSV_Loader_For_Redshift" rel="nofollow"&gt;CSV_Loader_For_Redshift.zip&lt;/a&gt;&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;Loads CSV file to Amazon-Redshift table from Windows command line.

Features:

Loads local (to your Windows desktop) CSV file to Amazon Redshift.
No need to preload your data to S3 prior to insert to Redshift.
No need for Amazon AWS CLI.
Works from your OS Windows desktop (command line).
It's executable (csv_loader_for_redshift.exe) - no need for Python install.
It's 32 bit - it will work on any vanilla Windows.
AWS Access Keys are not passed as arguments.
Written using Python/boto/PyInstaller.
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;</summary></entry></feed>