Thank You SoureForge for Rising Star award

Krzysztof Nowicki — Thu, 10 Jul 2025 08:41:12 -0000

We’re happy to share that DocWire SDK has received the Rising Star Award here on SourceForge.

It’s a next milestone for us — a sign that the project is gaining real momentum and solving problems for developers working with complex documents and data pipelines in C++.

DocWire SDK is a modern data processing toolkit built for performance, portability, and clean C++ design. It’s already being used in production by companies in AI, digital forensics, and consulting — and we’re looking to grow further.

If you’re:

Building in C++ and want to contribute to an active project
A developer or company who could use DocWire in your stack
Interested in partnering or collaborating with our team
We’d love to hear from you.

Thanks for checking us out — and for being part of the open-source world that makes this possible.

--The DocWire Team--

DocWire SDK – A Journey of Innovation in Data Extraction 2024 - 2025

Krzysztof Nowicki — Wed, 19 Feb 2025 11:50:49 -0000

Empowering C++ Developers with Cutting-Edge Data Processing

Over the past year, DocWire SDK has rapidly evolved, bringing powerful data extraction, parsing, and content processing capabilities to C++ developers worldwide. From foundational improvements in performance and stability to advanced AI-driven text analysis, our SDK has grown into an essential tool for anyone dealing with structured and unstructured data.

2025: The Next Leap in Data Extraction

Latest Release (Jan 2025) – Smarter Content Type Detection
With our latest update, DocWire SDK introduces enhanced file format recognition powered by file signatures, improving accuracy when dealing with diverse document types. A redesigned parsing chain now allows developers to effortlessly extend functionality using operator|= , making data extraction workflows more modular and flexible.

Key Features:
Content type detection based on file signatures
Refactored file format detection API for a cleaner and more maintainable codebase
Optimized parsing chain with easy-to-use operators for smoother data processing

2024: Building a Robust Foundation

Dec 2024 – Better Error Handling & Stability
Handling non-fatal errors and streamlining exception reporting has been a major focus. This update introduced improved error handling in OCR and XML processing, ensuring that partial failures don’t interrupt critical workflows. Developers now get better debugging insights and more structured error messages.

Nov 2024 – Faster Compilation & Modular Logging
To make the SDK more maintainable, we introduced header file optimizations, decoupled logging functionalities, and reduced unnecessary dependencies, resulting in faster builds and better modularity.

Oct 2024 – C++20 Functional Chaining & Improved XML Parsing
The introduction of function chaining allows developers to write cleaner, composable code when processing documents. Enhancements in XML parsing logic ensure smoother handling of nested document structures.

Going Beyond Extraction – AI & NLP Integration

One of the biggest milestones for DocWire SDK was the July 2024 release, where we introduced local AI model execution for tasks such as:

Text classification, summarization, translation, and sentiment analysis – all running natively in C++ without external dependencies.
Fuzzy string matching for smarter search and data comparison.
Enhanced dependency management for integrating third-party libraries seamlessly.

This made DocWire SDK a powerful tool for AI-powered data extraction and text processing directly in C++ applications, while maintaining full control over privacy and performance.

Why Choose DocWire SDK ?

Flexible & Modular – Extensible API with C++20-friendly function chaining and modernized architecture.
High-Performance Parsing – Optimized file format detection, improved memory management, and low-latency text processing.
Multi-Format Support – Handles XML, RTF, OOXML, PDFs, Emails, and more with OCR capabilities.
Developer-Friendly – Well-documented API, detailed error reporting, and support for major C++ build systems.
Privacy-Focused AI – Process natural language directly on-device, without relying on cloud-based services.
Extensive File Format Support – DocWire SDK processes a wide range of file types, including XML, RTF, OOXML, PDFs, Emails, and almost 100 more—with continuous expansions to support even more formats in future updates.

What’s Next?

We are continuously refining DocWire SDK to offer smarter, faster, and more reliable data extraction capabilities. Future updates will focus on performance optimizations, wider document support, and even more intuitive API improvements.

Try DocWire SDK Today!

Whether you're processing large-scale documents, building AI-powered applications, or need a high-performance parsing engine for C++, DocWire SDK is here to streamline your workflow.

Download the latest release on SourceForge! or find us on Github

https://github.com/docwire/docwire/releases/tag/2025.01.22

Latest and the greatest

Krzysztof — Thu, 25 Jul 2024 17:38:59 -0000

We’ve been busy making things faster, smoother, and all-around better for you. Here’s the lowdown:

Big Overhaul & Performance Boosts:
- Speed Enhancements: New caching and memory management tricks mean everything runs a lot faster.
- PST Parser Fix: Fixed a bug that was limiting mail processing – no more data loss!
- Exporters Enhanced: Added metadata support in HTML exporters and fixed issues in the EML parser for more accurate exports.
- Modern C++ Practices: We’ve upgraded to use move semantics and smart pointers, making our code safer and faster. No more unnecessary copying – it’s lean and mean now!
- Simplified Code: We cleaned house by getting rid of outdated stuff like ParserWrapper and wrapper_parser_creator. Now, it’s much easier to understand and work with.
- Better Parsing: We moved parsing duties from individual parsers to the Importer class. This makes everything more streamlined and independent.
- PDFParser Upgrade: We swapped out std::mutex lock() and unlock() with std::lock_guard for simpler and safer code.
- Bye-Bye Old Code: Out with the old! We replaced the FormattingStyle class and std::bind with shiny new lambda expressions.
- Performance Comparison Tool: New script to compare SDK performance – great for making sure we’re always improving.

New OS Support & More:
- OS Love: Added support for macOS 13, macOS 14, and Ubuntu 24.04 in our workflows.
- Better data_source Class: Now handles std::vector<std::byte>, std::span<const std::byte>, and std::string_view – super flexible!
- Error Checks Improved: Enhanced validation in table structures to prevent errors in plain_text_writer.
- New Tests: Added tons of new test cases for various input data sources in api_tests.cpp.
- Workflow Updates: Updated OS configurations in .github/workflows/build.yml – keeping things fresh and current.

We’re excited about these changes and can’t wait for you to dive in and see how they improve your experience. Check out all the details on our GitHub page.

Let us know what you think and stay tuned for more cool updates, cause its coming very soon!

Cheers,
The DocWire Team

DocToText Data Extraction SDK 5.0.9

Ferid Obeidat — Thu, 06 Jul 2023 11:05:13 -0000

Introducing the New Version of DocToText 5.0.9 SDK: Enhanced Features for Effortless Data Processing

We are thrilled to announce the release of the latest version of DocWire’s data extraction SDK. This version is packed with lots of powerful features that will help in streamlining the process of extracting, importing, and exporting various data types. Let’s dive into the latest features that make this version a game-changer in the field of data extraction.

Versatile Data Transformation Capabilities: DocToText 5.0.9 empowers users to transform data between import and export, enabling seamless filtering, aggregating, and other data manipulation tasks. This flexibility ensures that extracted data can be tailored to meet specific requirements, making it a valuable asset for a wide range of applications.

DocWire’s DocToText 5.0.9 comes equipped with multiple importers:
Microsoft Office new Office Open XML (OOXML): DOCX, XLSX, PPTX files
Microsoft Office old binary formats: DOC, XLS, XLSB, PPT files
OpenOffice/LibreOffice Open Document Format (ODF): ODT, ODS, ODP files
Portable Document Format: PDF files Webpages: HTML, HTM and CSS files
Rich Text Format: RTF files
Email formats with attachments: EML files, MS Outlook PST, OST files
Image formats: JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP with OCR capabilities
Apple iWork: PAGES, NUMBERS, KEYNOTE files
ODFXML (FODP, FODS, FODT)
Scripts and source codes: ASM, ASP, ASPX, BAS, BAT, C, CC, CMAKE, CS, CPP, CXX, D, F, FPP, FS, GO, H, HPP, HXX, JAVA, JS, JSP, LUA, PAS, PHP, PL, PERL, PY, R, SH, TCL, VB, VBS, WS files
XML format family: XML, XSD, XSL files
Comma-Separated Values: CSV files
Other structured text formats: JSON, YML, YAML, RSS, CONF files
Other unstructured text formats: MD, LOG files DICOM (DCM) as an additional commercial plugin

Enhanced Exporting Options: DocToText 5.0.9 offers expanded exporting capabilities that allows effortless export of extracted data in various formats, users can choose to export data as plain text, HTML, XLSX and CSV.
**
Superior Optical Character Recognition (OCR): DocToText is equipped with a high-grade, scriptable, and trainable OCR engine that utilizes LSTM neural networks for character recognition. This state-of-the-art OCR technology ensures accurate extraction of text from images, making it an invaluable tool for processing scanned documents, images, and other OCR-enabled files.

Enhanced Parsing Performance: In order to enhance user experience and double down on efficiency, our SDK now supports incremental parsing, which means delivering data as soon as it becomes available. This allows users to handle data in real-time, reducing processing time and enabling faster insights.

Cross-Platform Compatibility: DocWire’s data extraction software is now compatible with multiple operating systems, including Linux, Windows, and MacOSX, with plans for expanding to more platforms in the future. Regardless of the operating system, users can seamlessly integrate DocToText into their applications, leveraging its capabilities to enhance data mining and analytics processes.

Flexible Parsing Process Design: DocToText 5.0.9 introduces a user-friendly and intuitive method for designing parsing processes. By connecting objects with the pipe (|) operator, users can easily construct parsing chains to achieve the desired data extraction workflow. The communication between parsing chain elements is based on Boost Signals, ensuring efficient and reliable data flow.

Customizable Parsing Chain Elements: DocWire’s SDK allows users to customize the parsing chain by adding their own importers, transformers, and exporters. This flexibility ensures that the SDK can adapt to specific data extraction requirements, making it a highly versatile tool for a wide range of applications.

Compact Size and High Performance: We understand the importance of efficiency, which is why our SDK is designed with small binaries and fast native C++ code. This combination of compact size and high-performance ensures that data extraction tasks are executed swiftly and without compromising system resources.

Rebranding

Ferid Obeidat — Sun, 15 Jan 2023 18:09:50 -0000

NEWS RELEASE

Silvercoders Announce Rebranding to Docwire

The time-saving backbone of document processing, we provide the tools an organization needs in order to extract any data type of data

Silvercoders, a company that has provided data extraction tools and services for years, has announced today that it has completed a significant rebranding. The step was taken by the CEO and the board in order to reflect the new mission and vision of the company, namely: helping businesses to effectivise their operations by providing fast & dynamic document processing solutions.

The firm provided custom-made software focused on data extraction, mining, and document archiving from the beginning. Since then, the company has evolved into offering both standardized data extraction solutions and custom-made versions of existing products.
The company’s rebranding is a decision that the management has voted for unanimously as the changes not only extend to the name change, website and logo, but also in establishing a future direction of growth. This entails additional products and services that the organization will focus on in the nearest future. Thus creating Docwire, a combination of both quality and competence that Silvercoders always strived for, along with innovative solutions and a future outlook that matches with the new direction for which Docwire is aiming.

For more visit our website at: www.docwire.io

DocToText version 4.0 was released today.

SILVERCODERS — Tue, 07 Jan 2014 02:50:24 -0000

DocToText version 4.0 was oficially released today. After introducing PDF, iWork, XLSB, OpenDocument Flat XML and EML (email) this version of utility supports all important document formats on the market. Support for Object Linking and Embedding (OLE) in ODF formats added. Win64 is officially supported since this version. Capabilities of C API has been expanded significally. A lot of fixes and improvements, including improvements for multithreaded applications.

DocToText version 0.13.0 released today.

SILVERCODERS — Fri, 19 Oct 2012 13:03:17 -0000

DocToText version 0.13.0 was oficially released today. This is the first version available for Mac OS X and also the first version available as a C/C++ library in addition to the console application. MS PowerPoint binary format (PPT) support was added. Headers, footers and embedded XLS workbooks in DOC files are supported. Extracting text from OpenDocument and OOXML formats was significally optimized. In addition a lot of bugs were fixed in this version.