<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to CSV_Loader_For_Redshift</title><link>https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/</link><description>Recent changes to CSV_Loader_For_Redshift</description><atom:link href="https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/feed" rel="self"/><language>en</language><lastBuildDate>Thu, 07 Apr 2016 11:47:42 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/feed" rel="self" type="application/rss+xml"/><item><title>CSV_Loader_For_Redshift modified by Alex Buzunov</title><link>https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Alex Buzunov</dc:creator><pubDate>Thu, 07 Apr 2016 11:47:42 -0000</pubDate><guid>https://sourceforge.net723c4e7a91d564e10e81010ce5c60d5ef9aa9e6a</guid></item><item><title>CSV_Loader_For_Redshift modified by Alex Buzunov</title><link>https://sourceforge.net/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/</link><description>&lt;div class="markdown_content"&gt;&lt;h1 id="csv-file-loader-for-amazon-redshift-db"&gt;CSV File Loader for Amazon Redshift DB.&lt;/h1&gt;
&lt;p&gt;Loads CSV file to Amazon-Redshift table from Windows command line.&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - Loads local (to your Windows desktop) CSV file to Amazon Redshift.&lt;br/&gt;
 - No need to preload your data to S3 prior to insert to Redshift.&lt;br/&gt;
 - No need for Amazon AWS CLI.&lt;br/&gt;
 - Works from your OS Windows desktop (command line).&lt;br/&gt;
 - It's executable (csv_loader_for_redshift.exe)  - no need for Python install.&lt;br/&gt;
 - It's 32 bit - it will work on any vanilla Windows.&lt;br/&gt;
 - AWS Access Keys are not passed as arguments. &lt;br/&gt;
 - Written using Python/boto/PyInstaller.&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;32bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[0.1.0 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Ad-hoc CSV file load to Amazon Redshift table.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;File is staged on S3 prior to load to Redshift&lt;/li&gt;
&lt;li&gt;Optional upload to Reduced Redundancy storage (not RR by default).&lt;/li&gt;
&lt;li&gt;Optional "make it public" after upload (private by default)&lt;/li&gt;
&lt;li&gt;S3 Key defaulted to transfer file name.&lt;/li&gt;
&lt;li&gt;Load is done using COPY command&lt;/li&gt;
&lt;li&gt;Target Redshift table has to exist&lt;/li&gt;
&lt;li&gt;It's a Python/boto/psycopg2 script&lt;ul&gt;
&lt;li&gt;Boto S3 docs: &lt;a href="http://boto.cloudhackers.com/en/latest/ref/s3.html" rel="nofollow"&gt;http://boto.cloudhackers.com/en/latest/ref/s3.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;psycopg2 docs: &lt;a href="http://initd.org/psycopg/docs/" rel="nofollow"&gt;http://initd.org/psycopg/docs/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Executable is created using &lt;span&gt;[pyInstaller]&lt;/span&gt; (http://www.pyinstaller.org/)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps, &lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;## Load CSV file to Amazon Redshift table.
##
## Load % progress outputs to the screen.
##
Usage:  
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  
  csv_loader_for_redshift.py &amp;lt;file_to_transfer&amp;gt; &amp;lt;bucket_name&amp;gt; [&amp;lt;use_rr&amp;gt;] [&amp;lt;public&amp;gt;]
                         [&amp;lt;delim&amp;gt;] [&amp;lt;quote&amp;gt;] [&amp;lt;to_table&amp;gt;] [&amp;lt;gzip_source_file&amp;gt;]

    --use_rr -- Use reduced redundancy storage (False).
    --public -- Make uploaded files public (False).
    --delim  -- CSV file delimiter (',').
    --quote  -- CSV quote ('"').
    --to_table  -- Target Amazon-Redshit table name.
    --gzip_source_file  -- gzip input CVS file before upload to Amazon-S3 (False).

    Input filename will be used for S3 key name.

    Boto S3 docs: http://boto.cloudhackers.com/en/latest/ref/s3.html
    psycopg2 docs: http://initd.org/psycopg/docs/

"""
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="example"&gt;Example&lt;/h1&gt;
&lt;h3 id="environment-variables"&gt;Environment variables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Set the following environment variables (for all tests:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;

set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  
&lt;/pre&gt;&lt;/div&gt;


&lt;h3 id="csv-file-upload-into-redshift-table-test2"&gt;CSV file upload into Redshift table &lt;code&gt;test2&lt;/code&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;examples\Load_CSV_To_Redshift_Table.bat&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"  

cd c:\tmp\CSV_Loader
csv_loader_for_redshift.exe c:\tmp\data.csv test123 -r -d "," -t test2 -z
&lt;/pre&gt;&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;resutl.log (Load_CSV_To_Redshift_Table.bat &amp;gt; resutl.log)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;S3        | data.csv.gz | 100%
Redshift  | test2       | DONE
Time elapsed: 5.7 seconds
&lt;/pre&gt;&lt;/div&gt;


&lt;h2 id="test-prerequisits"&gt;Test prerequisits.&lt;/h2&gt;
&lt;h4 id="target-redshift-table-ddl"&gt;Target Redshift table DDL&lt;/h4&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;CREATE TABLE test2 (id integer , num integer, data varchar,num2 integer, data2 varchar,num3 
integer, data3 varchar,num4 integer, data4 varchar);
&lt;/pre&gt;&lt;/div&gt;


&lt;h4 id="test-data"&gt;Test data&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Test data is in file examples\data.csv&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="sources"&gt;Sources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Will add as soon as I clean em up and remove all the passwords and AWS keys :-)).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="download"&gt;Download&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;git clone https://github.com/alexbuz/CSV_Loader_For_Redshift&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/CSV_Loader_For_Redshift/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;csv_loader_for_redshift 0.1.0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1 id="faq"&gt;FAQ&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h4 id="can-it-load-csv-file-from-windows-desktop-to-amazon-redshift"&gt;Can it load CSV file from Windows desktop to Amazon Redshift.&lt;/h4&gt;
&lt;p&gt;Yes, it is the main purpose of this tool.&lt;/p&gt;
&lt;h4 id="can-developers-integrate-csv-loader-into-their-etl-pipelines"&gt;Can developers integrate CSV loader into their ETL pipelines?&lt;/h4&gt;
&lt;p&gt;Yes. Assuming they are doing it on OS Windows.&lt;/p&gt;
&lt;h4 id="how-fast-is-data-upload-using-csv-loader-for-redshift"&gt;How fast is data upload using &lt;code&gt;CSV Loader for Redshift&lt;/code&gt;?&lt;/h4&gt;
&lt;p&gt;As fast as any AWS API provided by Amazon.&lt;/p&gt;
&lt;h4 id="how-to-inscease-upload-speed"&gt;How to inscease upload speed?&lt;/h4&gt;
&lt;p&gt;Compress input file or provide &lt;code&gt;-z&lt;/code&gt; or &lt;code&gt;--gzip_source_file&lt;/code&gt; arg in command line and this tool will compress it for you before upload to S3.&lt;/p&gt;
&lt;h4 id="what-are-the-other-ways-to-upload-file-to-redshift"&gt;What are the other ways to upload file to Redshift?&lt;/h4&gt;
&lt;p&gt;You can use 'aws s3api' and psql COPY command to do pretty much the same.&lt;/p&gt;
&lt;h4 id="can-i-just-zip-it-using-windows-file-explorer"&gt;Can I just zip it using Windows File Explorer?&lt;/h4&gt;
&lt;p&gt;No, Redshift will not recognize *.zip file format.&lt;br/&gt;
You have to &lt;code&gt;gzip&lt;/code&gt; it. You can use 7-Zip to do that.&lt;/p&gt;
&lt;h4 id="does-it-delete-file-from-s3-after-upload"&gt;Does it delete file from S3 after upload?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="does-it-create-target-redshift-table"&gt;Does it create target Redshift table?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="is-there-an-option-to-compress-input-csv-file-before-upload"&gt;Is there an option to compress input CSV file before upload?&lt;/h4&gt;
&lt;p&gt;Yes. Use &lt;code&gt;-z&lt;/code&gt; or &lt;code&gt;--gzip_source_file&lt;/code&gt; argument so the tool does compression for you.&lt;/p&gt;
&lt;h4 id="explain-first-step-of-data-load"&gt;Explain first step of data load?&lt;/h4&gt;
&lt;p&gt;The CSV you provided is getting preloaded to Amazon-S3.&lt;br/&gt;
It doesn't have to be made public for load to Redshift. &lt;br/&gt;
It can be compressed or uncompressed.&lt;br/&gt;
Your input file is getting compressed (optional) and uploaded to S3 using credentials you set in shell.&lt;/p&gt;
&lt;h4 id="explain-second-step-of-data-load-how-data-is-loaded-to-amazon-redshift"&gt;Explain second step of data load. How data is loaded to Amazon Redshift?&lt;/h4&gt;
&lt;p&gt;You Redshift cluster has to be open to the world (accessible via port 5439 from internet).&lt;br/&gt;
It uses PostgreSQL COPY command to load file located on S3 into Redshift table.&lt;/p&gt;
&lt;h4 id="can-i-use-winzip-or-7-zip"&gt;Can I use WinZip or 7-zip&lt;/h4&gt;
&lt;p&gt;Yes, but you have to use 'gzip' compression type.&lt;/p&gt;
&lt;h4 id="what-technology-was-used-to-create-this-tool"&gt;What technology was used to create this tool&lt;/h4&gt;
&lt;p&gt;I used Python, Boto, and psycopg2 to write it.&lt;br/&gt;
Boto is used to upload file to S3. &lt;br/&gt;
psycopg2 is used to establish ODBC connection with Redshift clusted and execute &lt;code&gt;COPY&lt;/code&gt; command.&lt;/p&gt;
&lt;h4 id="where-are-the-sources"&gt;Where are the sources?&lt;/h4&gt;
&lt;p&gt;Please, contact me for sources.&lt;/p&gt;
&lt;h4 id="can-you-modify-functionality-and-add-features"&gt;Can you modify functionality and add features?&lt;/h4&gt;
&lt;p&gt;Yes, please, ask me for new features.&lt;/p&gt;
&lt;h4 id="what-other-aws-tools-youve-created"&gt;What other AWS tools you've created?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span&gt;[S3_Sanity_Check]&lt;/span&gt; (https://github.com/alexbuz/S3_Sanity_Check/blob/master/README.md) - let's you &lt;code&gt;ping&lt;/code&gt; Amazon-S3 bucket to see if it's publicly readable.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/blob/master/README.md" rel="nofollow"&gt;EC2_Metrics_Plotter&lt;/a&gt; - plots any CloudWatch EC2 instance  metric stats.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/blob/master/README.md" rel="nofollow"&gt;S3_File_Uploader&lt;/a&gt; - uploads file from Windows to S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="do-you-have-any-aws-certifications"&gt;Do you have any AWS Certifications?&lt;/h4&gt;
&lt;p&gt;Yes, &lt;a class="" href="https://raw.githubusercontent.com/alexbuz/FAQs/master/images/AWS_Ceritied_Developer_Associate.png" rel="nofollow"&gt;AWS Certified Developer (Associate)&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="can-you-create-similarcustom-data-tool-for-our-business"&gt;Can you create similar/custom data tool for our business?&lt;/h4&gt;
&lt;p&gt;Yes, you can PM me here or email at &lt;code&gt;alex_buz@yahoo.com&lt;/code&gt;.&lt;br/&gt;
I'll get back to you within hours.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/FAQs/blob/master/README.md" rel="nofollow"&gt;Employment FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Alex Buzunov</dc:creator><pubDate>Thu, 07 Apr 2016 11:46:34 -0000</pubDate><guid>https://sourceforge.netb1e287cc83dfd5a9972938cc3442f109849c45b7</guid></item></channel></rss>