<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Oracle_To_S3_Data_Uploader</title><link>https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_S3_Data_Uploader/</link><description>Recent changes to Oracle_To_S3_Data_Uploader</description><atom:link href="https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_S3_Data_Uploader/feed" rel="self"/><language>en</language><lastBuildDate>Thu, 07 Apr 2016 12:31:04 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_S3_Data_Uploader/feed" rel="self" type="application/rss+xml"/><item><title>Oracle_To_S3_Data_Uploader modified by Alex Buzunov</title><link>https://sourceforge.net/p/aws-data-tools/wiki/Oracle_To_S3_Data_Uploader/</link><description>&lt;div class="markdown_content"&gt;&lt;h1 id="oracle-to-s3-data-uploader"&gt;Oracle-to-S3 data uploader.&lt;/h1&gt;
&lt;p&gt;Let's you stream your Oracle table/query data to Amazon-S3 from Windows CLI (command line).&lt;/p&gt;
&lt;p&gt;Features:&lt;br/&gt;
 - Streams Oracle table data to Amazon-S3.&lt;br/&gt;
 - No need to create CSV extracts before upload to S3.&lt;br/&gt;
 - Data stream is compressed while upload to S3.&lt;br/&gt;
 - No need for Amazon AWS CLI.&lt;br/&gt;
 - Works from your OS Windows desktop (command line).&lt;br/&gt;
 - It's executable (Oracle_To_S3_Uploader.exe)  - no need for Python install.&lt;br/&gt;
 - It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.&lt;br/&gt;
 - AWS Access Keys are not passed as arguments. &lt;br/&gt;
 - Written using Python/boto/PyInstaller.&lt;/p&gt;
&lt;h2 id="version"&gt;Version&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;64bit&lt;/td&gt;
&lt;td&gt;&lt;span&gt;[1.2 beta]&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="purpose"&gt;Purpose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Stream (upload) Oracle table data to Amazon-S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-it-works"&gt;How it works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Tool connects to source Oracle DB and opens data pipe for reading.&lt;/li&gt;
&lt;li&gt;Data is pumped to S3 using multipart upload.&lt;/li&gt;
&lt;li&gt;Optional upload to Reduced Redundancy storage (not RR by default).&lt;/li&gt;
&lt;li&gt;Optional "make it public" after upload (private by default)&lt;/li&gt;
&lt;li&gt;If doesn't, bucket is created&lt;/li&gt;
&lt;li&gt;You can control the region where new bucket is created&lt;/li&gt;
&lt;li&gt;Streamed data can be tee'd (dumped on disk) during upload.&lt;/li&gt;
&lt;li&gt;If not set, S3 Key defaulted to query file name.&lt;/li&gt;
&lt;li&gt;It's a Python/boto script&lt;ul&gt;
&lt;li&gt;Boto S3 docs: &lt;a href="http://boto.cloudhackers.com/en/latest/ref/s3.html" rel="nofollow"&gt;http://boto.cloudhackers.com/en/latest/ref/s3.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Executable is created using &lt;span&gt;[pyInstaller]&lt;/span&gt; (http://www.pyinstaller.org/)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="audience"&gt;Audience&lt;/h2&gt;
&lt;p&gt;Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps, &lt;/p&gt;
&lt;h2 id="designated-environment"&gt;Designated Environment&lt;/h2&gt;
&lt;p&gt;Pre-Prod (UAT/QA/DEV)&lt;/p&gt;
&lt;h2 id="usage"&gt;Usage&lt;/h2&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32\PROJECTS\Ora2S3&amp;gt;dist\oracle_to_s3_uploader.exe
#############################################################################
#Oracle to S3 Data Uploader (v1.2, beta, 04/05/2016 15:11:53) [64bit]
#Copyright (c): 2016 Alex Buzunov, All rights reserved.
#Agreement: Use this tool at your own risk. Author is not liable for any damages
#           or losses related to the use of this software.
################################################################################
Usage:
  set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
  set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;
  set ORACLE_LOGIN=tiger/scott@orcl
  set ORACLE_CLIENT_HOME=C:\app\oracle12\product\12.1.0\dbhome_1

  oracle_to_s3_uploader.exe [&amp;lt;ora_query_file&amp;gt;] [&amp;lt;ora_col_delim&amp;gt;] [&amp;lt;ora_add_header&amp;gt;]
                            [&amp;lt;s3_bucket_name&amp;gt;] [&amp;lt;s3_key_name&amp;gt;] [&amp;lt;s3_use_rr&amp;gt;] [&amp;lt;s3_public&amp;gt;]

        --ora_query_file -- SQL query to execure in source Oracle db.
        --ora_col_delim  -- CSV column delimiter (|).
        --ora_add_header -- Add header line to CSV file (False).
        --ora_lame_duck  -- Limit rows for trial upload (1000).
        --create_data_dump -- Use it if you want to persist streamed data on your filesystem.

        --s3_bucket_name -- S3 bucket name (always set it).
        --s3_location    -- New bucket location name (us-west-2)
                                Set it if you are creating new bucket
        --s3_key_name    -- CSV file name (to store query results on S3).
                if &amp;lt;s3_key_name&amp;gt; is not specified, the oracle query filename (ora_query_file) will be used.
        --s3_use_rr -- Use reduced redundancy storage (False).
        --s3_write_chunk_size -- Chunk size for multipart upoad to S3 (10&amp;lt;&amp;lt;21, ~20MB).
        --s3_public -- Make uploaded file public (False).

        Oracle data uploaded to S3 is always compressed (gzip).
&lt;/pre&gt;&lt;/div&gt;


&lt;h1 id="example"&gt;Example&lt;/h1&gt;
&lt;h3 id="environment-variables"&gt;Environment variables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Set the following environment variables (for all tests):&lt;br/&gt;
set_env.bat:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;set AWS_ACCESS_KEY_ID=&amp;lt;you access key&amp;gt;
set AWS_SECRET_ACCESS_KEY=&amp;lt;you secret key&amp;gt;

set ORACLE_LOGIN=tiger/scott@orcl
set ORACLE_CLIENT_HOME=C:\\app\\oracle12\\product\\12.1.0\\dbhome_1
&lt;/pre&gt;&lt;/div&gt;


&lt;h3 id="test-upload-with-data-dump"&gt;Test upload with data dump.&lt;/h3&gt;
&lt;p&gt;In this example complete table &lt;code&gt;test2&lt;/code&gt; get's uploaded to Aamzon-S3 as compressed CSV file.&lt;/p&gt;
&lt;p&gt;Contents of the file &lt;em&gt;table_query.sql&lt;/em&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;SELECT * FROM test2;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Also temporary dump file is created for analysis (by default there are no files created)&lt;br/&gt;
Use &lt;code&gt;-s, --create_data_dump&lt;/code&gt; to dump streamed data.&lt;/p&gt;
&lt;p&gt;If target bucket does not exists it will be created in user controlled region.&lt;br/&gt;
Use argument &lt;code&gt;-t, --s3_location&lt;/code&gt; to set target region name&lt;/p&gt;
&lt;p&gt;Contents of the file &lt;em&gt;test.bat&lt;/em&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;dist\oracle_to_s3_uploader.exe ^
    -q table_query.sql ^
    -d "|" ^
    -e ^
    -b test_bucket ^
    -k oracle_table_export ^
    -r ^
    -p ^
    -s
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;Executing &lt;code&gt;test.bat&lt;/code&gt;:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;c:\Python35-32\PROJECTS\Ora2S3&amp;gt;dist\oracle_to_s3_uploader.exe   -q table_query.sql      -d "|"  -e      -b test_bucket       -k oracle_table_export  -r      -p      -s
Uploading results of "table_query.sql" to existing bucket "test_bucket"
Dumping data to: c:\Python35-32\PROJECTS\Ora2S3\data_dump\table_query\test_bucket\oracle_table_export.20160405_235310.gz
1 chunk 10.0 GB [8.95 sec]
2 chunk 5.94 GB [5.37 sec]
Uncompressed data size: 15.94 GB
Compressed data size: 63.39 MB
Upload complete (17.58 sec).
Your PUBLIC upload is at: https://s3-us-west-2.amazonaws.com/test_bucket/oracle_table_export.gz
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;&lt;img alt="Test results" rel="nofollow" src="https://raw.githubusercontent.com/alexbuz/Oracle_To_S3_Data_Uploader/master/dist-64bit/ora_to_s3_upload.png" title="Test Results"/&gt;&lt;/p&gt;
&lt;h3 id="download"&gt;Download&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;git clone https://github.com/alexbuz/Oracle_To_S3_Data_Uploader&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/Oracle_To_S3_Data_Uploader/archive/master.zip" rel="nofollow"&gt;Master Release&lt;/a&gt; -- &lt;code&gt;oracle_to_s3_uploader 1.2&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h1 id="faq"&gt;FAQ&lt;/h1&gt;
&lt;h1&gt;&lt;/h1&gt;
&lt;h4 id="can-it-load-oracle-data-to-amazon-s3-file"&gt;Can it load Oracle data to Amazon S3 file?&lt;/h4&gt;
&lt;p&gt;Yes, it is the main purpose of this tool.&lt;/p&gt;
&lt;h4 id="can-developers-integrate-oracle_to_s3_data_uploader-into-their-etl-pipelines"&gt;Can developers integrate &lt;code&gt;Oracle_To_S3_Data_Uploader&lt;/code&gt; into their ETL pipelines?&lt;/h4&gt;
&lt;p&gt;Yes. Assuming they are doing it on OS Windows.&lt;/p&gt;
&lt;h4 id="how-fast-is-data-upload-using-csv-loader-for-redshift"&gt;How fast is data upload using &lt;code&gt;CSV Loader for Redshift&lt;/code&gt;?&lt;/h4&gt;
&lt;p&gt;As fast as any implementation of multi-part load using Python and boto.&lt;/p&gt;
&lt;h4 id="how-to-inscease-upload-speed"&gt;How to inscease upload speed?&lt;/h4&gt;
&lt;p&gt;Input data stream is getting compressed before upload to S3. So not much could be done here.&lt;br/&gt;
You may want to run it closer to source or target for better performance.&lt;/p&gt;
&lt;h4 id="what-are-the-other-ways-to-move-large-amounts-of-data-from-oracle-to-s3"&gt;What are the other ways to move large amounts of data from Oracle to S3?&lt;/h4&gt;
&lt;p&gt;You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.&lt;/p&gt;
&lt;h4 id="does-it-create-temporary-data-file-to-facilitate-data-load-to-s3"&gt;Does it create temporary data file to facilitate data load to S3?&lt;/h4&gt;
&lt;p&gt;No&lt;/p&gt;
&lt;h4 id="can-i-log-transfered-data-for-analysis"&gt;Can I log transfered data for analysis?&lt;/h4&gt;
&lt;p&gt;Yes, Use &lt;code&gt;-s, --create_data_dump&lt;/code&gt; to dump streamed data.&lt;/p&gt;
&lt;h4 id="explain-first-step-of-data-transfer"&gt;Explain first step of data transfer?&lt;/h4&gt;
&lt;p&gt;The query file you provided is used to select data form target Oracle server.&lt;br/&gt;
Stream is compressed before load to S3.&lt;/p&gt;
&lt;h4 id="explain-second-step-of-data-transfer"&gt;Explain second step of data transfer?&lt;/h4&gt;
&lt;p&gt;Compressed data is getting uploaded to S3 using multipart upload protocol.&lt;/p&gt;
&lt;h4 id="what-technology-was-used-to-create-this-tool"&gt;What technology was used to create this tool&lt;/h4&gt;
&lt;p&gt;I used SQL&lt;em&gt;Plus, Python, Boto to write it.&lt;br/&gt;
Boto is used to upload file to S3. &lt;br/&gt;
SQL&lt;/em&gt;Plus is used to spool data to compressor pipe.&lt;/p&gt;
&lt;h4 id="where-are-the-sources"&gt;Where are the sources?&lt;/h4&gt;
&lt;p&gt;Please, contact me for sources.&lt;/p&gt;
&lt;h4 id="can-you-modify-functionality-and-add-features"&gt;Can you modify functionality and add features?&lt;/h4&gt;
&lt;p&gt;Yes, please, ask me for new features.&lt;/p&gt;
&lt;h4 id="what-other-aws-tools-youve-created"&gt;What other AWS tools you've created?&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/CSV_Loader_For_Redshift/"&gt;[CSV_Loader_For_Redshift]&lt;/a&gt; (https://github.com/alexbuz/CSV_Loader_For_Redshift/blob/master/README.md) - Append CSV data to Amazon-Redshift from Windows.&lt;/li&gt;
&lt;li&gt;&lt;a class="alink" href="/p/aws-data-tools/wiki/S3_Sanity_Check/"&gt;[S3_Sanity_Check]&lt;/a&gt; (https://github.com/alexbuz/S3_Sanity_Check/blob/master/README.md) - let's you &lt;code&gt;ping&lt;/code&gt; Amazon-S3 bucket to see if it's publicly readable.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/EC2_Metrics_Plotter/blob/master/README.md" rel="nofollow"&gt;EC2_Metrics_Plotter&lt;/a&gt; - plots any CloudWatch EC2 instance  metric stats.&lt;/li&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/S3_File_Uploader/blob/master/README.md" rel="nofollow"&gt;S3_File_Uploader&lt;/a&gt; - uploads file from Windows to S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="do-you-have-any-aws-certifications"&gt;Do you have any AWS Certifications?&lt;/h4&gt;
&lt;p&gt;Yes, &lt;a class="" href="https://raw.githubusercontent.com/alexbuz/FAQs/master/images/AWS_Ceritied_Developer_Associate.png" rel="nofollow"&gt;AWS Certified Developer (Associate)&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="can-you-create-similarcustom-data-tool-for-our-business"&gt;Can you create similar/custom data tool for our business?&lt;/h4&gt;
&lt;p&gt;Yes, you can PM me here or email at &lt;code&gt;alex_buz@yahoo.com&lt;/code&gt;.&lt;br/&gt;
I'll get back to you within hours.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="" href="https://github.com/alexbuz/FAQs/blob/master/README.md" rel="nofollow"&gt;Employment FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Alex Buzunov</dc:creator><pubDate>Thu, 07 Apr 2016 12:31:04 -0000</pubDate><guid>https://sourceforge.net0bce35fd4f138dbdaed4ab47b9212e6fd0a24e52</guid></item></channel></rss>